grundic / confluence-page-copier Goto Github PK

View Code? Open in Web Editor NEW

24.0 24.0 14.0 358 KB

Python script for creating recursive copy of Confluence pages.

License: MIT License

Python 100.00%

confluence-page-copier's People

Contributors

Stargazers

Watchers

Forkers

c0ns0le andyleadbetter akatashkov rfominych majikun gcaracuel mattfrei27 zhifac wna-se rasata be-aws-architect mciecha wzr lambertshi001

confluence-page-copier's Issues

Bracket Errors {...}

Putting {any_text} or {any_number} results in errors that are misleading.

Putting {random}, {bad}, {blah}, etc. in the destination title will fail the script and throw a key error.
Putting {1}, {2}, etc. in the destination title will fail the script and throw an index out of bounds error.

Possibly add a check / throw an error to catch these occurrences?
Possibly document to warn users not to use anything enclosed in brackets in the destination titles.
Maybe you'll want to allow them to put brackets in the actual titles by using an escape character.

--dry-run mode is broken

--dry-run mode simulates creating page in the destination, but then actually searches it in the Confluence using ID "-1" and crashes

c:\python27\python.exe copier.py --username="" --password="" --endpoint="https://_.atlassian.net/wiki" --dst-space=~rf --src-title="test page 1" --src-space="DOC" --dst-title-template="{title}" --dst-parent-id=66322435 --dry-run
DEBUG:confl-copier:Searching page by space 'DOC' and title 'test page 1'
DEBUG:confl-copier:Found 1 page(s)
DEBUG:confl-copier:Searching page by space '~rf' and title 'test page 1'
DEBUG:confl-copier:Found 0 page(s)
DEBUG:confl-copier:Searching page by id '26574918'
DEBUG:confl-copier:Searching page by id '66322435'
INFO:confl-copier:Copying [DOC]:'Documentation'/'test page 1' => [~rf]:'R... F...├ó┬Ç┬Ös Home'/'test page 1'
INFO:api-proxy:[DRY-RUN] create_new_content({'body': {'storage': {'representation': 'storage', 'value': u'

some contents

'}}, 'title': u'test page 1', 'type': u'page', 'ancestors': [{'id': '66322435'}], 'space': {'key': '~rf'}})
DEBUG:confl-copier:Searching page by id '74514464'
DEBUG:confl-copier:Searching page by space '~rf' and title 'child page 1'
DEBUG:confl-copier:Found 0 page(s)
DEBUG:confl-copier:Searching page by id '-1'
Traceback (most recent call last):
File "copier.py", line 470, in
recursion_limit=args.recursion_limit
File "copier.py", line 138, in copy
skip_attachments=skip_attachments
File "copier.py", line 104, in copy
page_copy = self._copy_page(source, ancestor_id, dst_space_key, dst_title)
File "copier.py", line 269, in _copy_page
dst_parent_page = self._find_page(content_id=ancestor_id)
File "c:\python27\lib\site-packages\boltons\cacheutils.py", line 542, in call
ret = cache[key] = self.func(_args, *_kwargs)
File "copier.py", line 148, in _find_page
expand=self.EXPAND_FIELDS
File "c:\python27\lib\site-packages\PythonConfluenceAPI\api.py", line 225, in get_content_by_id
callback=callback)
File "c:\python27\lib\site-packages\PythonConfluenceAPI\api.py", line 131, in _service_get_request
return self._service_request("GET", *args, _kwargs)
File "c:\python27\lib\site-packages\PythonConfluenceAPI\api.py", line 116, in _service_request
response.raise_for_status()
File "c:\python27\lib\site-packages\requests\models.py", line 844, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://.atlassian.net/wiki/rest/api/content/-1?expand=body.storage%2Cspace%2Cancestors%2Cversion

copies pages correctly after removing --dry-run

c:\python27\python.exe copier.py --username="" --password="" --endpoint="https://*.atlassian.net/wiki" --dst-space=~rf --src-title="test page 1" --src-space="DOC" --dst-title-template="{title}" --dst-parent-id=66322435
DEBUG:confl-copier:Searching page by space 'DOC' and title 'test page 1'
DEBUG:confl-copier:Found 1 page(s)
DEBUG:confl-copier:Searching page by space '~rf' and title 'test page 1'
DEBUG:confl-copier:Found 0 page(s)
DEBUG:confl-copier:Searching page by id '26574918'
DEBUG:confl-copier:Searching page by id '66322435'
INFO:confl-copier:Copying [DOC]:'Documentation'/'test page 1' => [~rf]:'R... F...├ó┬Ç┬Ös Home'/'test page 1'
DEBUG:confl-copier:Searching page by id '74514464'
DEBUG:confl-copier:Searching page by space '~rf' and title 'child page 1'
DEBUG:confl-copier:Found 0 page(s)
DEBUG:confl-copier:Searching page by id '74514552'
INFO:confl-copier:Copying [DOC]:'Documentation'/'child page 1' => [~rf]:'test page 1'/'child page 1'
INFO:confl-copier:Copying 1 attachment(s)
DEBUG:confl-copier:Downloading 'image2016-6-15.png' attachment
DEBUG:confl-copier:Creating new attachment 'image2016-6-15.png'
DEBUG:confl-copier:Removing temp directory 'c:\users\roman\appdata\local\temp\tmprha6rg'

Suggest to loosen the dependency on boltons

Hi, your project confluence-page-copier requires "boltons==16.4.1" in its dependency. After analyzing the source code, we found that some other versions of boltons can also be suitable without affecting your project, i.e., boltons 16.3.0, 16.3.1, 16.4.0. Therefore, we suggest to loosen the dependency on boltons from "boltons==16.4.1" to "boltons>=16.3.0,<=16.4.1" to avoid any possible conflict for importing more packages or for downstream projects that may use confluence-page-copier.

May I pull a request to loosen the dependency on boltons?

By the way, could you please tell us whether such dependency analysis may be potentially helpful for maintaining dependencies easier during your development?

For your reference, here are details in our analysis.

Your project confluence-page-copier(commit id: 311c1f6) directly uses 2 APIs from package boltons.

boltons.cacheutils.cachedmethod, boltons.cacheutils.LRU.__init__

From which, 6 functions are then indirectly called, including 4 boltons's internal APIs and 2 outsider APIs, as follows (neglecting some repeated function occurrences).

[/grundic/confluence-page-copier]
+--boltons.cacheutils.cachedmethod
|      +--boltons.cacheutils.CachedMethod.__init__
|      |      +--operator.attrgetter
+--boltons.cacheutils.LRU.__init__
|      +--boltons.cacheutils.RLock.__init__
|      +--threading.RLock
|      +--boltons.cacheutils.LRU._init_ll
|      +--boltons.cacheutils.LRU.update

We scan boltons's versions among [16.3.0, 16.3.1, 16.4.0] and 16.4.1, the changing functions (diffs being listed below) have none intersection with any function or API we mentioned above (either directly or indirectly called by this project).

diff: 16.4.1(original) 16.3.0
['boltons.funcutils.wraps', 'boltons.statsutils.Stats._calc_iqr', 'boltons.funcutils.FunctionBuilder.remove_arg', 'boltons.funcutils.FunctionBuilder', 'boltons.statsutils.Stats.describe', 'boltons.funcutils.FunctionBuilder.get_func', 'boltons.ecoutils._escape_shell_args', 'boltons.socketutils.NetstringMessageTooLong', 'boltons.tbutils.ParsedException', 'boltons.ecoutils.main', 'boltons.funcutils._indent', 'boltons.funcutils.FunctionBuilder.__init__', 'boltons.funcutils.mro_items', 'boltons.statsutils.Stats.__init__', 'boltons.ecoutils.get_profile', 'boltons.funcutils.FunctionBuilder.get_invocation_str', 'boltons.statsutils.Stats.get_histogram_counts', 'boltons.funcutils.FunctionBuilder._argspec_to_dict', 'boltons.tbutils.ExceptionInfo', 'boltons.socketutils.ConnectionClosed', 'boltons.tbutils.ExceptionInfo.get_formatted_exception_only', 'boltons.iterutils.chunked_iter', 'boltons.statsutils.Stats', 'boltons.tbutils.ParsedException.to_string', 'boltons.socketutils.NetstringInvalidSize', 'boltons.ecoutils._fake_json_dumps', 'boltons.ecoutils.get_python_info', 'boltons.funcutils.FunctionBuilder.lambda', 'boltons.funcutils.FunctionBuilder.get_sig_str', 'boltons.statsutils.describe', 'boltons.funcutils.FunctionBuilder.get_defaults_dict', 'boltons.statsutils.Stats.format_histogram', 'boltons.statsutils.Stats._get_bin_bounds', 'boltons.funcutils.FunctionBuilder._compile', 'boltons.socketutils.Timeout', 'boltons.funcutils.FunctionBuilder.from_func', 'boltons.ecoutils.get_profile_json', 'boltons.statsutils.format_histogram_counts', 'boltons.statsutils.Stats.trim_relative']

diff: 16.4.1(original) 16.3.1
['boltons.funcutils.wraps', 'boltons.statsutils.Stats._calc_iqr', 'boltons.funcutils.FunctionBuilder.remove_arg', 'boltons.funcutils.FunctionBuilder', 'boltons.statsutils.Stats.describe', 'boltons.funcutils.FunctionBuilder.get_func', 'boltons.ecoutils._escape_shell_args', 'boltons.tbutils.ParsedException', 'boltons.ecoutils.main', 'boltons.funcutils._indent', 'boltons.funcutils.FunctionBuilder.__init__', 'boltons.funcutils.mro_items', 'boltons.statsutils.Stats.__init__', 'boltons.ecoutils.get_profile', 'boltons.funcutils.FunctionBuilder.get_invocation_str', 'boltons.statsutils.Stats.get_histogram_counts', 'boltons.funcutils.FunctionBuilder._argspec_to_dict', 'boltons.tbutils.ExceptionInfo', 'boltons.tbutils.ExceptionInfo.get_formatted_exception_only', 'boltons.iterutils.chunked_iter', 'boltons.statsutils.Stats', 'boltons.tbutils.ParsedException.to_string', 'boltons.ecoutils._fake_json_dumps', 'boltons.ecoutils.get_python_info', 'boltons.funcutils.FunctionBuilder.lambda', 'boltons.funcutils.FunctionBuilder.get_sig_str', 'boltons.statsutils.describe', 'boltons.funcutils.FunctionBuilder.get_defaults_dict', 'boltons.statsutils.Stats.format_histogram', 'boltons.statsutils.Stats._get_bin_bounds', 'boltons.funcutils.FunctionBuilder._compile', 'boltons.funcutils.FunctionBuilder.from_func', 'boltons.ecoutils.get_profile_json', 'boltons.statsutils.format_histogram_counts', 'boltons.statsutils.Stats.trim_relative']

diff: 16.4.1(original) 16.4.0
['boltons.statsutils.Stats._calc_iqr', 'boltons.statsutils.Stats', 'boltons.statsutils.Stats.__init__', 'boltons.statsutils.Stats.format_histogram', 'boltons.statsutils.Stats._get_bin_bounds', 'boltons.statsutils.Stats.describe', 'boltons.statsutils.describe', 'boltons.statsutils.Stats.get_histogram_counts', 'boltons.statsutils.format_histogram_counts', 'boltons.tbutils.ExceptionInfo', 'boltons.statsutils.Stats.trim_relative', 'boltons.tbutils.ExceptionInfo.get_formatted_exception_only']

As for other packages, the APIs of @outside_package_name are called by boltons in the call graph and the dependencies on these packages also stay the same in our suggested versions, thus avoiding any outside conflict.

Therefore, we believe that it is quite safe to loose your dependency on boltons from "boltons==16.4.1" to "boltons>=16.3.0,<=16.4.1". This will improve the applicability of confluence-page-copier and reduce the possibility of any further dependency conflict with other projects/packages.

Order of pages is not preserved

page order in the copied tree is not preserved, pages are stored alphabetically. e.g.

Given tree with ordered children, not in alphabetical order:
root1
B child
A child

Script goes through the pages in the correct order from top to bottom:
copy root1
copy B child
copy A child

the result tree will be with children in alphabetical order:
root1
A child
B child

Questions are:

Can we preserve page order when issuing "COPY" command to Confluence API?
Or maybe we can use some additional "ORDER" command after copying each page?

Problem when trying to copy more advanced tree structures

Hi,
Really appreciate you sharing that piece of code.
I have been playing around with it trying to copy some page trees on my Confluence. I have successfully made it work on small trees with a simple structure. (even cross-space).
However, when trying to copy more 'advanced' structures (with possibly a few dozen pages, and/or pages containing attachements etc.) I run into errors:
E.g.

$ python copier.py --username="*" --password="*" --endpoint="https://*.atlassian.net/wiki" --src-space="DTS" --src-title="Releases Documentation" --dst-space="FT" --dst-title-template="{title} (Copied using copier.py)"
DEBUG:confl-copier:Searching page by space 'DTS' and title 'Releases Documentation'
DEBUG:confl-copier:Found 1 page(s)
DEBUG:confl-copier:Setting ancestor id to 917507
DEBUG:confl-copier:Searching page by space 'FT' and title 'Releases Documentation (Copied using copier.py)'
DEBUG:confl-copier:Found 0 page(s)
INFO:confl-copier:Copying 'DTS/Releases Documentation' => 'FT/Releases Documentation (Copied using copier.py)'
Traceback (most recent call last):
  File "copier.py", line 430, in <module>
    recursion_limit=args.recursion_limit
  File "copier.py", line 95, in copy
    page_copy = self._copy_page(source, ancestor_id, dst_space_key, dst_title)
  File "copier.py", line 272, in _copy_page
    'ancestors': [{'id': ancestor_id}],
  File "/usr/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 794, in create_new_content
    headers={"Content-Type": "application/json"}, callback=callback)
  File "/usr/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 140, in _service_post_request
    return self._service_request("POST", *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 116, in _service_request
    response.raise_for_status()
  File "/usr/lib/python2.7/site-packages/requests/models.py", line 840, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://dexstr.atlassian.net/wiki/rest/api/content

Not the most evocative of error messages...
Have you ever run into this kind of problem or have you got an idea as to why or how to circumvent?

Thanks anyway,
Regards
Francois

Issue w/ {counter} in title template on user spaces

There is an issue when attempting to use {counter} in the title template when copying pages in user spaces.

I ran the following scenarios on a page with only text in it:

Regular space w/ out {counter} in template title - no error.
Regular space w/ {counter} in template title - no error.
Personal space w/ out {counter} in template title - no error.
Personal space w/ {counter} in template title - error:

$ python2 copier.py --src-id=27183904 --dst-title-template="{title} {counter}" --endpoint="..." --username="..." --password="..."
DEBUG:confl-copier:Searching page by id '27183904'
DEBUG:confl-copier:Setting destination space key to source's value '~andy_boutin'
Traceback (most recent call last):
File "copier.py", line 422, in
recursion_limit=args.recursion_limit
File "copier.py", line 70, in copy
dst_space_key, dst_title_template = self._init_destination_page(source, dst_space_key, dst_title_template)
File "copier.py", line 183, in _init_destination_page
counter = self._get_title_counter(space_key=dst_space_key, title=source['title'], template=title_template)
File "copier.py", line 196, in _get_title_counter
space=space_key, title=title
File "/usr/local/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 322, in search_content
return self._service_get_request("rest/api/content/search", params=params, callback=callback)
File "/usr/local/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 131, in _service_get_request
return self._service_request("GET", _args, *_kwargs)
File "/usr/local/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 116, in _service_request
response.raise_for_status()
File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 844, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://confluence.compellent.com/rest/api/content/search?start=0&cql=space+%3D+~andy_boutin+and+title+~+%22Dummy%22

Installation: "future" module is missing

after standard install using "pip install -r requirements.txt" command, trying to run application using Python 2 I got error:

Traceback (most recent call last):
File "copier.py", line 11, in
from PythonConfluenceAPI import ConfluenceAPI
File "c:\python27\lib\site-packages\PythonConfluenceAPI__init__.py", line 6, in
from future import standard_library
ImportError: No module named future

so then I run command:
"pip install future"

and application started working

so probably installation of "future" is missing

Log command before downloading file: fails to output file name if it cannot be successfully formatted

line
self.log.debug("Downloading '{name}' attachment".format(name=attachment['title']))

had output:
File "copier.py", line 320, in _copy_attachments
self.log.debug("Downloading '{name}' attachment".format(name=attachment['title']))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 49-51: ordinal not in range(128)

when:
filename contained BOM at the end of filename, that is EF BB BF (in hex), so "format" command was not able to format, and failed without telling what was that bad attachment/filename

for myself, I changed it to the following code and it worked
#self.log.debug("Downloading '{name}' attachment".format(name=attachment['title']))
self.log.debug("Downloading attachment:".format(name=attachment['title']))
self.log.debug(attachment['title'])

file2.txt

Page copy w/ attachment error

I am having an issue copying any pages that have an attachment.

The page gets copied, but then an error gets thrown when attempting to copy the attachment. It leaves an image that says ? Unknown Attachment in its place.

I get the same error attempting to do the copy in a personal space & regular space. Also, have tried with different attachment types. Also, tested on a page with just text and a single attachment.

$ python2 copier.py --src-id=27183904 --dst-title-template="{title} new" --endpoint="..." --username="..." --password="..."
DEBUG:confl-copier:Searching page by id '27183904'
DEBUG:confl-copier:Setting destination space key to source's value '~andy_boutin'
DEBUG:confl-copier:Setting ancestor id to 2884752
DEBUG:confl-copier:Searching page by space '~andy_boutin' and title 'Dummy new'
DEBUG:confl-copier:Found 0 page(s)
INFO:confl-copier:Copying '~andy_boutin/Dummy' => '~andy_boutin/Dummy new'
INFO:confl-copier:Copying 1 attachment(s)
DEBUG:confl-copier:Downloading 'useravatar.png' attachment
DEBUG:confl-copier:Removing temp directory '/tmp/tmpEMyf53'
Traceback (most recent call last):
File "copier.py", line 422, in
recursion_limit=args.recursion_limit
File "copier.py", line 106, in copy
self._copy_attachments(source, page_copy_id)
File "copier.py", line 292, in _copy_attachments
content = self._client._service_get_request(sub_uri=attachment['_links']['download'][1:], raw=True)
File "/usr/local/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 131, in _service_get_request
return self._service_request("GET", _args, *_kwargs)
File "/usr/local/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 111, in _service_request
uri = urljoin(self.uri_base, sub_uri)
File "/usr/local/lib/python2.7/site-packages/future/backports/urllib/parse.py", line 418, in urljoin
base, url, _coerce_result = _coerce_args(base, url)
File "/usr/local/lib/python2.7/site-packages/future/backports/urllib/parse.py", line 115, in _coerce_args
raise TypeError("Cannot mix str and non-str arguments")
TypeError: Cannot mix str and non-str arguments

error

I'm getting an error when running this and not really sure how to diagnose or fix it.

Traceback (most recent call last):
  File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 508, in <module>
    recursion_limit=args.recursion_limit
  File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 136, in copy
    skip_attachments=skip_attachments
  File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 115, in copy
    self._copy_attachments(source, page_copy_id)
  File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 347, in _copy_attachments
    attachment_comment=attachment_comment
  File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 363, in _create_attachment
    return self._process_attachment(url, attachment_name, attachment_content, attachment_type, attachment_comment)
  File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 376, in _process_attachment
    prepared_request = req.prepare()
  File "C:\Python27\lib\site-packages\requests\models.py", line 251, in prepare
    hooks=self.hooks,
  File "C:\Python27\lib\site-packages\requests\models.py", line 298, in prepare
    self.prepare_body(data, files, json)
  File "C:\Python27\lib\site-packages\requests\models.py", line 449, in prepare_body
    (body, content_type) = self._encode_files(files, data)
  File "C:\Python27\lib\site-packages\requests\models.py", line 152, in _encode_files
    fdata = fp.read()
AttributeError: 'NoneType' object has no attribute 'read'

--dst-parent parameter not working

so page is copied to the root

C:\Users\roman\Documents\documentation\confluence-page-copier-master>c:\python27\python.exe copier.py --username="" --password="" --endpoint="https://*.atlassian.net/wiki" --src-id=81133576 --dst-parent-id=81133572 --dst-space="~rf" --src-space="~rf"
DEBUG:confl-copier:Searching page by id '81133576'
DEBUG:confl-copier:Setting ancestor id to 66322435
DEBUG:confl-copier:Searching page by space '~rf' and title 'good attach page (1)'
DEBUG:confl-copier:Found 0 page(s)
DEBUG:confl-copier:Searching page by id '66322435'
INFO:confl-copier:Copying [~rf]:'Roman Fominych├ó┬Ç┬Ös Home'/'good attach page' => [~rf]:'Roman Fominych├ó┬Ç┬Ös Home'/'good attach page (1)'
INFO:confl-copier:Copying 1 attachment(s)
DEBUG:confl-copier:Downloading 'BET-255.png' attachment
DEBUG:confl-copier:Creating new attachment 'BET-255.png'

Suggestion: temporary file creation is redundant

How system works now:

Attachment is downloaded
Temporary file is created
File is uploaded

How system should work:

Attachment is downloaded to memory (stream)
Attachment is uploaded from memory

Now there is problem with Confluence and host(script) file system file naming compatibility. For example national symbols, special symbols like BOM and other, prohibited windows filename symbols like ":" that are allowed in Confluence as filenames. If file will not be saved to the host filesystem - then all these problems will not exist