grundic / confluence-page-copier Goto Github PK
View Code? Open in Web Editor NEWPython script for creating recursive copy of Confluence pages.
License: MIT License
Python script for creating recursive copy of Confluence pages.
License: MIT License
Putting {any_text} or {any_number} results in errors that are misleading.
Putting {random}, {bad}, {blah}, etc. in the destination title will fail the script and throw a key error.
Putting {1}, {2}, etc. in the destination title will fail the script and throw an index out of bounds error.
--dry-run mode simulates creating page in the destination, but then actually searches it in the Confluence using ID "-1" and crashes
c:\python27\python.exe copier.py --username="" --password="" --endpoint="https://_.atlassian.net/wiki" --dst-space=~rf --src-title="test page 1" --src-space="DOC" --dst-title-template="{title}" --dst-parent-id=66322435 --dry-run
DEBUG:confl-copier:Searching page by space 'DOC' and title 'test page 1'
DEBUG:confl-copier:Found 1 page(s)
DEBUG:confl-copier:Searching page by space '~rf' and title 'test page 1'
DEBUG:confl-copier:Found 0 page(s)
DEBUG:confl-copier:Searching page by id '26574918'
DEBUG:confl-copier:Searching page by id '66322435'
INFO:confl-copier:Copying [DOC]:'Documentation'/'test page 1' => [~rf]:'R... F...’s Home'/'test page 1'
INFO:api-proxy:[DRY-RUN] create_new_content({'body': {'storage': {'representation': 'storage', 'value': u'
some contents
'}}, 'title': u'test page 1', 'type': u'page', 'ancestors': [{'id': '66322435'}], 'space': {'key': '~rf'}})copies pages correctly after removing --dry-run
c:\python27\python.exe copier.py --username="" --password="" --endpoint="https://*.atlassian.net/wiki" --dst-space=~rf --src-title="test page 1" --src-space="DOC" --dst-title-template="{title}" --dst-parent-id=66322435
DEBUG:confl-copier:Searching page by space 'DOC' and title 'test page 1'
DEBUG:confl-copier:Found 1 page(s)
DEBUG:confl-copier:Searching page by space '~rf' and title 'test page 1'
DEBUG:confl-copier:Found 0 page(s)
DEBUG:confl-copier:Searching page by id '26574918'
DEBUG:confl-copier:Searching page by id '66322435'
INFO:confl-copier:Copying [DOC]:'Documentation'/'test page 1' => [~rf]:'R... F...’s Home'/'test page 1'
DEBUG:confl-copier:Searching page by id '74514464'
DEBUG:confl-copier:Searching page by space '~rf' and title 'child page 1'
DEBUG:confl-copier:Found 0 page(s)
DEBUG:confl-copier:Searching page by id '74514552'
INFO:confl-copier:Copying [DOC]:'Documentation'/'child page 1' => [~rf]:'test page 1'/'child page 1'
INFO:confl-copier:Copying 1 attachment(s)
DEBUG:confl-copier:Downloading 'image2016-6-15.png' attachment
DEBUG:confl-copier:Creating new attachment 'image2016-6-15.png'
DEBUG:confl-copier:Removing temp directory 'c:\users\roman\appdata\local\temp\tmprha6rg'
Hi, your project confluence-page-copier requires "boltons==16.4.1" in its dependency. After analyzing the source code, we found that some other versions of boltons can also be suitable without affecting your project, i.e., boltons 16.3.0, 16.3.1, 16.4.0. Therefore, we suggest to loosen the dependency on boltons from "boltons==16.4.1" to "boltons>=16.3.0,<=16.4.1" to avoid any possible conflict for importing more packages or for downstream projects that may use confluence-page-copier.
May I pull a request to loosen the dependency on boltons?
By the way, could you please tell us whether such dependency analysis may be potentially helpful for maintaining dependencies easier during your development?
For your reference, here are details in our analysis.
Your project confluence-page-copier(commit id: 311c1f6) directly uses 2 APIs from package boltons.
boltons.cacheutils.cachedmethod, boltons.cacheutils.LRU.__init__
From which, 6 functions are then indirectly called, including 4 boltons's internal APIs and 2 outsider APIs, as follows (neglecting some repeated function occurrences).
[/grundic/confluence-page-copier]
+--boltons.cacheutils.cachedmethod
| +--boltons.cacheutils.CachedMethod.__init__
| | +--operator.attrgetter
+--boltons.cacheutils.LRU.__init__
| +--boltons.cacheutils.RLock.__init__
| +--threading.RLock
| +--boltons.cacheutils.LRU._init_ll
| +--boltons.cacheutils.LRU.update
We scan boltons's versions among [16.3.0, 16.3.1, 16.4.0] and 16.4.1, the changing functions (diffs being listed below) have none intersection with any function or API we mentioned above (either directly or indirectly called by this project).
diff: 16.4.1(original) 16.3.0
['boltons.funcutils.wraps', 'boltons.statsutils.Stats._calc_iqr', 'boltons.funcutils.FunctionBuilder.remove_arg', 'boltons.funcutils.FunctionBuilder', 'boltons.statsutils.Stats.describe', 'boltons.funcutils.FunctionBuilder.get_func', 'boltons.ecoutils._escape_shell_args', 'boltons.socketutils.NetstringMessageTooLong', 'boltons.tbutils.ParsedException', 'boltons.ecoutils.main', 'boltons.funcutils._indent', 'boltons.funcutils.FunctionBuilder.__init__', 'boltons.funcutils.mro_items', 'boltons.statsutils.Stats.__init__', 'boltons.ecoutils.get_profile', 'boltons.funcutils.FunctionBuilder.get_invocation_str', 'boltons.statsutils.Stats.get_histogram_counts', 'boltons.funcutils.FunctionBuilder._argspec_to_dict', 'boltons.tbutils.ExceptionInfo', 'boltons.socketutils.ConnectionClosed', 'boltons.tbutils.ExceptionInfo.get_formatted_exception_only', 'boltons.iterutils.chunked_iter', 'boltons.statsutils.Stats', 'boltons.tbutils.ParsedException.to_string', 'boltons.socketutils.NetstringInvalidSize', 'boltons.ecoutils._fake_json_dumps', 'boltons.ecoutils.get_python_info', 'boltons.funcutils.FunctionBuilder.lambda', 'boltons.funcutils.FunctionBuilder.get_sig_str', 'boltons.statsutils.describe', 'boltons.funcutils.FunctionBuilder.get_defaults_dict', 'boltons.statsutils.Stats.format_histogram', 'boltons.statsutils.Stats._get_bin_bounds', 'boltons.funcutils.FunctionBuilder._compile', 'boltons.socketutils.Timeout', 'boltons.funcutils.FunctionBuilder.from_func', 'boltons.ecoutils.get_profile_json', 'boltons.statsutils.format_histogram_counts', 'boltons.statsutils.Stats.trim_relative']
diff: 16.4.1(original) 16.3.1
['boltons.funcutils.wraps', 'boltons.statsutils.Stats._calc_iqr', 'boltons.funcutils.FunctionBuilder.remove_arg', 'boltons.funcutils.FunctionBuilder', 'boltons.statsutils.Stats.describe', 'boltons.funcutils.FunctionBuilder.get_func', 'boltons.ecoutils._escape_shell_args', 'boltons.tbutils.ParsedException', 'boltons.ecoutils.main', 'boltons.funcutils._indent', 'boltons.funcutils.FunctionBuilder.__init__', 'boltons.funcutils.mro_items', 'boltons.statsutils.Stats.__init__', 'boltons.ecoutils.get_profile', 'boltons.funcutils.FunctionBuilder.get_invocation_str', 'boltons.statsutils.Stats.get_histogram_counts', 'boltons.funcutils.FunctionBuilder._argspec_to_dict', 'boltons.tbutils.ExceptionInfo', 'boltons.tbutils.ExceptionInfo.get_formatted_exception_only', 'boltons.iterutils.chunked_iter', 'boltons.statsutils.Stats', 'boltons.tbutils.ParsedException.to_string', 'boltons.ecoutils._fake_json_dumps', 'boltons.ecoutils.get_python_info', 'boltons.funcutils.FunctionBuilder.lambda', 'boltons.funcutils.FunctionBuilder.get_sig_str', 'boltons.statsutils.describe', 'boltons.funcutils.FunctionBuilder.get_defaults_dict', 'boltons.statsutils.Stats.format_histogram', 'boltons.statsutils.Stats._get_bin_bounds', 'boltons.funcutils.FunctionBuilder._compile', 'boltons.funcutils.FunctionBuilder.from_func', 'boltons.ecoutils.get_profile_json', 'boltons.statsutils.format_histogram_counts', 'boltons.statsutils.Stats.trim_relative']
diff: 16.4.1(original) 16.4.0
['boltons.statsutils.Stats._calc_iqr', 'boltons.statsutils.Stats', 'boltons.statsutils.Stats.__init__', 'boltons.statsutils.Stats.format_histogram', 'boltons.statsutils.Stats._get_bin_bounds', 'boltons.statsutils.Stats.describe', 'boltons.statsutils.describe', 'boltons.statsutils.Stats.get_histogram_counts', 'boltons.statsutils.format_histogram_counts', 'boltons.tbutils.ExceptionInfo', 'boltons.statsutils.Stats.trim_relative', 'boltons.tbutils.ExceptionInfo.get_formatted_exception_only']
As for other packages, the APIs of @outside_package_name are called by boltons in the call graph and the dependencies on these packages also stay the same in our suggested versions, thus avoiding any outside conflict.
Therefore, we believe that it is quite safe to loose your dependency on boltons from "boltons==16.4.1" to "boltons>=16.3.0,<=16.4.1". This will improve the applicability of confluence-page-copier and reduce the possibility of any further dependency conflict with other projects/packages.
page order in the copied tree is not preserved, pages are stored alphabetically. e.g.
Given tree with ordered children, not in alphabetical order:
root1
B child
A child
Script goes through the pages in the correct order from top to bottom:
copy root1
copy B child
copy A child
the result tree will be with children in alphabetical order:
root1
A child
B child
Questions are:
Hi,
Really appreciate you sharing that piece of code.
I have been playing around with it trying to copy some page trees on my Confluence. I have successfully made it work on small trees with a simple structure. (even cross-space).
However, when trying to copy more 'advanced' structures (with possibly a few dozen pages, and/or pages containing attachements etc.) I run into errors:
E.g.
$ python copier.py --username="*" --password="*" --endpoint="https://*.atlassian.net/wiki" --src-space="DTS" --src-title="Releases Documentation" --dst-space="FT" --dst-title-template="{title} (Copied using copier.py)"
DEBUG:confl-copier:Searching page by space 'DTS' and title 'Releases Documentation'
DEBUG:confl-copier:Found 1 page(s)
DEBUG:confl-copier:Setting ancestor id to 917507
DEBUG:confl-copier:Searching page by space 'FT' and title 'Releases Documentation (Copied using copier.py)'
DEBUG:confl-copier:Found 0 page(s)
INFO:confl-copier:Copying 'DTS/Releases Documentation' => 'FT/Releases Documentation (Copied using copier.py)'
Traceback (most recent call last):
File "copier.py", line 430, in <module>
recursion_limit=args.recursion_limit
File "copier.py", line 95, in copy
page_copy = self._copy_page(source, ancestor_id, dst_space_key, dst_title)
File "copier.py", line 272, in _copy_page
'ancestors': [{'id': ancestor_id}],
File "/usr/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 794, in create_new_content
headers={"Content-Type": "application/json"}, callback=callback)
File "/usr/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 140, in _service_post_request
return self._service_request("POST", *args, **kwargs)
File "/usr/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 116, in _service_request
response.raise_for_status()
File "/usr/lib/python2.7/site-packages/requests/models.py", line 840, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://dexstr.atlassian.net/wiki/rest/api/content
Not the most evocative of error messages...
Have you ever run into this kind of problem or have you got an idea as to why or how to circumvent?
Thanks anyway,
Regards
Francois
There is an issue when attempting to use {counter} in the title template when copying pages in user spaces.
I ran the following scenarios on a page with only text in it:
$ python2 copier.py --src-id=27183904 --dst-title-template="{title} {counter}" --endpoint="..." --username="..." --password="..."
DEBUG:confl-copier:Searching page by id '27183904'
DEBUG:confl-copier:Setting destination space key to source's value '~andy_boutin'
Traceback (most recent call last):
File "copier.py", line 422, in
recursion_limit=args.recursion_limit
File "copier.py", line 70, in copy
dst_space_key, dst_title_template = self._init_destination_page(source, dst_space_key, dst_title_template)
File "copier.py", line 183, in _init_destination_page
counter = self._get_title_counter(space_key=dst_space_key, title=source['title'], template=title_template)
File "copier.py", line 196, in _get_title_counter
space=space_key, title=title
File "/usr/local/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 322, in search_content
return self._service_get_request("rest/api/content/search", params=params, callback=callback)
File "/usr/local/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 131, in _service_get_request
return self._service_request("GET", _args, *_kwargs)
File "/usr/local/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 116, in _service_request
response.raise_for_status()
File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 844, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://confluence.compellent.com/rest/api/content/search?start=0&cql=space+%3D+~andy_boutin+and+title+~+%22Dummy%22
after standard install using "pip install -r requirements.txt" command, trying to run application using Python 2 I got error:
Traceback (most recent call last):
File "copier.py", line 11, in
from PythonConfluenceAPI import ConfluenceAPI
File "c:\python27\lib\site-packages\PythonConfluenceAPI__init__.py", line 6, in
from future import standard_library
ImportError: No module named future
so then I run command:
"pip install future"
and application started working
so probably installation of "future" is missing
line
self.log.debug("Downloading '{name}' attachment".format(name=attachment['title']))
had output:
File "copier.py", line 320, in _copy_attachments
self.log.debug("Downloading '{name}' attachment".format(name=attachment['title']))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 49-51: ordinal not in range(128)
when:
filename contained BOM at the end of filename, that is EF BB BF (in hex), so "format" command was not able to format, and failed without telling what was that bad attachment/filename
for myself, I changed it to the following code and it worked
#self.log.debug("Downloading '{name}' attachment".format(name=attachment['title']))
self.log.debug("Downloading attachment:".format(name=attachment['title']))
self.log.debug(attachment['title'])
I am having an issue copying any pages that have an attachment.
The page gets copied, but then an error gets thrown when attempting to copy the attachment. It leaves an image that says ? Unknown Attachment in its place.
I get the same error attempting to do the copy in a personal space & regular space. Also, have tried with different attachment types. Also, tested on a page with just text and a single attachment.
$ python2 copier.py --src-id=27183904 --dst-title-template="{title} new" --endpoint="..." --username="..." --password="..."
DEBUG:confl-copier:Searching page by id '27183904'
DEBUG:confl-copier:Setting destination space key to source's value '~andy_boutin'
DEBUG:confl-copier:Setting ancestor id to 2884752
DEBUG:confl-copier:Searching page by space '~andy_boutin' and title 'Dummy new'
DEBUG:confl-copier:Found 0 page(s)
INFO:confl-copier:Copying '~andy_boutin/Dummy' => '~andy_boutin/Dummy new'
INFO:confl-copier:Copying 1 attachment(s)
DEBUG:confl-copier:Downloading 'useravatar.png' attachment
DEBUG:confl-copier:Removing temp directory '/tmp/tmpEMyf53'
Traceback (most recent call last):
File "copier.py", line 422, in
recursion_limit=args.recursion_limit
File "copier.py", line 106, in copy
self._copy_attachments(source, page_copy_id)
File "copier.py", line 292, in _copy_attachments
content = self._client._service_get_request(sub_uri=attachment['_links']['download'][1:], raw=True)
File "/usr/local/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 131, in _service_get_request
return self._service_request("GET", _args, *_kwargs)
File "/usr/local/lib/python2.7/site-packages/PythonConfluenceAPI/api.py", line 111, in _service_request
uri = urljoin(self.uri_base, sub_uri)
File "/usr/local/lib/python2.7/site-packages/future/backports/urllib/parse.py", line 418, in urljoin
base, url, _coerce_result = _coerce_args(base, url)
File "/usr/local/lib/python2.7/site-packages/future/backports/urllib/parse.py", line 115, in _coerce_args
raise TypeError("Cannot mix str and non-str arguments")
TypeError: Cannot mix str and non-str arguments
I'm getting an error when running this and not really sure how to diagnose or fix it.
Traceback (most recent call last):
File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 508, in <module>
recursion_limit=args.recursion_limit
File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 136, in copy
skip_attachments=skip_attachments
File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 115, in copy
self._copy_attachments(source, page_copy_id)
File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 347, in _copy_attachments
attachment_comment=attachment_comment
File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 363, in _create_attachment
return self._process_attachment(url, attachment_name, attachment_content, attachment_type, attachment_comment)
File "C:\Users\rober\Downloads\confluence-page-copier-master\confluence-page-copier-master\copier.py", line 376, in _process_attachment
prepared_request = req.prepare()
File "C:\Python27\lib\site-packages\requests\models.py", line 251, in prepare
hooks=self.hooks,
File "C:\Python27\lib\site-packages\requests\models.py", line 298, in prepare
self.prepare_body(data, files, json)
File "C:\Python27\lib\site-packages\requests\models.py", line 449, in prepare_body
(body, content_type) = self._encode_files(files, data)
File "C:\Python27\lib\site-packages\requests\models.py", line 152, in _encode_files
fdata = fp.read()
AttributeError: 'NoneType' object has no attribute 'read'
so page is copied to the root
C:\Users\roman\Documents\documentation\confluence-page-copier-master>c:\python27\python.exe copier.py --username="" --password="" --endpoint="https://*.atlassian.net/wiki" --src-id=81133576 --dst-parent-id=81133572 --dst-space="~rf" --src-space="~rf"
DEBUG:confl-copier:Searching page by id '81133576'
DEBUG:confl-copier:Setting ancestor id to 66322435
DEBUG:confl-copier:Searching page by space '~rf' and title 'good attach page (1)'
DEBUG:confl-copier:Found 0 page(s)
DEBUG:confl-copier:Searching page by id '66322435'
INFO:confl-copier:Copying [~rf]:'Roman Fominych’s Home'/'good attach page' => [~rf]:'Roman Fominych’s Home'/'good attach page (1)'
INFO:confl-copier:Copying 1 attachment(s)
DEBUG:confl-copier:Downloading 'BET-255.png' attachment
DEBUG:confl-copier:Creating new attachment 'BET-255.png'
How system works now:
How system should work:
Now there is problem with Confluence and host(script) file system file naming compatibility. For example national symbols, special symbols like BOM and other, prohibited windows filename symbols like ":" that are allowed in Confluence as filenames. If file will not be saved to the host filesystem - then all these problems will not exist
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.