Git Product home page Git Product logo

collective.exportimport's Issues

Support exporting a lot of content

Same as with the import (solved in #4), the export of a lot of data would eat up all the memory on the machine. The python dict that is created holds the data during export before being written to file as json can be quite large if you choose to to include base64-encoded binary data.

I have the use-case to export 60GB of content in files.

Options:

  • Export the blob-path and load each blob from the filesystem. That could be quite efficient.
  • Use https://pypi.org/project/jsonlines as format and write one object at a time to the filesystem. This would rquire changes in the import since jsoinlines is not readable by json or ijson.
  • Fake using jsonlines by writing one object at a time into a file but add a comma at the end of each line and wrap it in []. This would create valid json file the the import could read.

Save last export date per portal_type

We have a Site with a lot of content admin traffic and there are new articles on the old page after we exported its content. So my idea is to safe the export date for each portal_type in a small portal annotation dict and add a checkbox on the @@export_content page to export types after the last_export_date... this date could/should also be shown in the type selector label ...

import_content: Dry run option with more verbose deserializer validation feedback

While test-using collective.exportimport I'm running into edge cases. I could solve these solve these by adding more code to a subclassed export_content or import_content view in my own custom migrationhelper package.

But the edge case is often 1 field on 1-3 content items in a 8 year old site. It's just not worth it to try to catch these in code, but easier to fix in the source site. See #12 as an example.

The main cause for validation error is the schema validation running at the deserializer step on import_content in plone.restapi. And plone.restapi is catching all validation errors and rethrowing them with a generic ValidationError class that only show the field, but not the error. (https://github.com/plone/plone.restapi/blob/f89276054088340b3ec6775db6280b1dc46f0866/src/plone/restapi/deserializer/dxcontent.py#L55-L60)

To support fixing these blips of migration issues, it would be nice to have a 'dry_run' option in the import_content that tries to create and deserialize a json, catches any validation errors and outputs a report of found original errors coming from https://github.com/plone/plone.restapi/blob/f89276054088340b3ec6775db6280b1dc46f0866/src/plone/restapi/deserializer/dxcontent.py#L45

For this to work we'd need a new feature in plone.restapi to not swallow the ValidationError's as shown above.

html_fixer breaks links to browser-views

Relative links to browser-views are replaced to link to the current objects parent:

(Pdb++) from collective.exportimport.fix_html import html_fixer
(Pdb++) text = """<p><a href="edit">Link to a browser view</a></p>"""
(Pdb++) html_fixer(text, self.context)
'<p><a data-linktype="internal" data-val="7eb11200ba09ec174b74f24d2fb6f0c1" href="resolveuid/7eb11200ba09ec174b74f24d2fb6f0c1">Link to @@edit view</a></p>'
(Pdb++) self.context.__parent__.UID()
'7eb11200ba09ec174b74f24d2fb6f0c1'

Event with empty event_url field causes ValidationError in deserialization and "AttributeError: creation_date_migrated"

An Event exported from a Plone 4.3.6 site has the event_url field being an empty string if the field on the original object was not set.
As a result, when importing this object into a 5.2 site, this WARNING will be logged:

WARNING [collective.exportimport.import_content:310] cannot deserialize http://localhost:.../...: BadRequest([{'message': 'The specified URI is not valid.', 'field': 'event_url', 'error': 'ValidationError'}],)

This happens in ImportContent.import_new_content():

        # import using plone.restapi deserializers
        deserializer = getMultiAdapter((new, self.request), IDeserializeFromJson)
        try:
            new = deserializer(validate_all=False, data=item)
        except Exception as error:
            logger.warning(
                "cannot deserialize {}: {}".format(item["@id"], repr(error))
            )
            continue

One of the bad side-effects of this is that the folowing line is never reached for this object:

https://github.com/collective/collective.exportimport/blob/main/src/collective/exportimport/import_content.py#L356:

            new.creation_date_migrated = creation_date

Therefore, when later running @@reset_dates, this Event will produce this error:

AttributeError: creation_date_migrated

This is because acquisiton causes getattr(obj, "creation_date_migrated", None) to be true-ish, but del obj.creation_date raises AttributeError:

        created = getattr(obj, "creation_date_migrated", None)
        if created and created != obj.creation_date:
            obj.creation_date = created
            del obj.creation_date_migrated
            obj.reindexObject(idxs=["created"])

All of this can be avoided by setting item['event_url'] = None, which is what I did in a custom global_dict_hook():

def global_dict_hook(self, item):
    url = item.get('event_url', None)
    if url == '':
        item['event_url'] = None
    return item

Utlimately, I'm not sure if this should be addressed in plone.restapi or here, but it's one of various issues (like #12) that affect default content types and should not require custom hooks.

[proposal] unified export format

problem

  • Currently have to use several different forms to do both export and then import
  • You also have to know the right order to do the import when you have multiple export json

proposed solution

User will go to the site setup and click on export and get a UI something like

-------------------------------------------------------------
| Warning: multiple exports selected. Download will be tar.gz |
-------------------------------------------------------------

# Exports
[x] Content
[x] File/Images
[  ] Users
[  ] Content Tree
[  ] Relations
[  ] Translations
[  ] Local Roles
[  ] Default Page Mapping
[  ] Object Positions in Parent
[  ] Comments


# Content Export

{query widget}
Type: Page
Path: /news depth:1
Path: /other-news depth:1
Creation date: > 1/1/20018

Selected content (21 items)
--------------------------------
| /news/item1
| /news/item2
| /other-news/big-news
--------------------------------

# File/Images Export

(o) url/path
(  ) binary in tar.gz 
(  ) base64 encoded in json

[Download] [Save to Server] [Dry Run] [Cancel]

Features this adds

  • tar.gz if multiple exports selected
    • still keeps json as multiple files
    • Would enable things to be put in the right orde
    • still memory efficient as tar can be streamed.
  • somehow take every exporter and merging into one UI.
    • Adapter to get schema?

Alternatives considered

  • still have seperate views?
  • use a single json format that contains the others
    • can still stream it and webserver might do gzip compression automatically anyway
    • can still choose to include blobs for small sites using base64 encoding.
    • con: might be harder for the user to inject or change blobs without writing code?

Wrong @id for exported Collections

Exported collections have the wrong @id attribute set.

[
    {
        "@id": "http://localhost:8080/Plone/@@export_content", 
        "@type": "Collection", 
        "UID": "e6b3bf21738d4866b9acd1d0e7a1cf51", 
        "allow_discussion": false, 
        "..."
    }
]

I was able to bypass this with a custom export view:

class CustomExportContent(ExportContent):

    def dict_hook_collection(self, item, obj):
        """Use this to modify or skip the serialized data by type.
        Return the modified dict (item) or None if you want to skip this particular object.
        """
        # Fix the id for collections, which is set to “@@export-content” because of the HypermediaBatch in plone.restapi
        item["@id"] = obj.absolute_url()
        return item

For the import, only the @id is required, although there are more properties with the wrong url:

        "batching": {
            "@id": "http://localhost:8080/Plone/@@export_content", 
            "first": "http://localhost:8080/Plone/@@export_content?b_start=0", 
            "last": "http://localhost:8080/Plone/@@export_content?b_start=50", 
            "next": "http://localhost:8080/Plone/@@export_content?b_start=25"
        }, 

@pbauer, any opinion on how to fix this? Would it be ok to add the hook to the add-on directly?

Add modifiers to simplify handling migrations

Some changes need to be done to the serialized data when this tool is used for a migration (e.g. from Plone 4 to Plone 5 or 6) and/or from Archetypes to Dexterity.

To make this easier we could add a checkbox (checked by default) "Modify data for migrations".

If this is checked then some modifiers will run during export.

These could include:

Drop unused data

Some data that restapi includes is useless for migrations. E.g. @components, next_item, previous_item.

Drop all Relations

Relations are migrated seperately. haveing them in the data will mess up the site. This is probably easiest done by switching on custom serializers for IReferenceField (AT) and IRelationChoice and IRelationList (DX) that return none.

Change default-fieldnames (AT to DX):

    # Migrate AT to DX
    if item.get("expirationDate"):
        item["expires"] = item["expirationDate"]
    if item.get("effectiveDate"):
        item["effective"] = item["effectiveDate"]
    if item.get("excludeFromNav"):
        item["exclude_from_nav"] = item["excludeFromNav"]
    if item.get("subject"):
        item["subjects"] = item["subject"]

Fix datetime data

see #10

Fix issue with AT Text fields

TextField-export in Archetypes: Inspecting the AT-schema and applying a change for all Textfields if the RichtextWidget is not used (which means the field is probably Text in DX and not RichText).

    # In Archetypes Text is handled the same as RichText
    if isinstance(item.get("description", None), dict):
        item[fieldname] = item[fieldname]["data"]
    if isinstance(item.get('rights', None), dict):
        item['rights'] = item['rights']['data']

Fix collection-criteria

Some criteria have changed, e.g.

query = item.pop("query", [])
for crit in query:
    if crit["o"].endswith("relativePath") and crit["v"] == "..":
        crit["v"] = "..::1"
    if crit["i"] == "portal_type" and crit["o"].endswith("selection.is"):
        crit["o"] = "plone.app.querystring.operation.selection.any"

Fix image links and scales

Use the code in https://github.com/collective/collective.migrationhelpers/blob/master/src/collective/migrationhelpers/images.py to fix links to images and make them editable in TinyMCE.

self.debug is not defined during relations export

Curious, because the function itself takes a debug parameter. (Plone 5.2 instance)

Traceback:

INFO:interpreter:Exporting relations from site
Traceback (most recent call last):
  File "/home/user/optplone/deployments/master/parts/client1/bin/interpreter", line 293, in <module>
    exec(_val)
  File "<string>", line 1, in <module>
  File "/home/user/optplone/deployments/master/src/ruddocom.policy/ruddocom/policy/export_embedded.py", line 101, in <module>
    full_export(site, exportpath, outputpath, what=what)
  File "/home/user/optplone/deployments/master/src/ruddocom.policy/ruddocom/policy/export_embedded.py", line 74, in full_export
    export_view()
  File "/home/user/optplone/deployments/master/src/collective.exportimport/src/collective/exportimport/export_other.py", line 71, in __call__
    all_stored_relations = self.get_all_references(debug)
  File "/home/user/optplone/deployments/master/src/collective.exportimport/src/collective/exportimport/export_other.py", line 136, in get_all_references
    if self.debug:
AttributeError: 'SimpleViewClass from /home/user/optplone/deploymen' object has no attribute 'debug'

git blame

2a0ff06f src/collective/exportimport/export_other.py         (Philip Bauer      2021-06-03 14:55:24 +0200 135)                             }
2a0ff06f src/collective/exportimport/export_other.py         (Philip Bauer      2021-06-03 14:55:24 +0200 136)                             if self.debug:
2a0ff06f src/collective/exportimport/export_other.py         (Philip Bauer      2021-06-03 14:55:24 +0200 137)                                 item["from_path"] = from_brain[0].getPath()
2a0ff06f src/collective/exportimport/export_other.py         (Philip Bauer      2021-06-03 14:55:24 +0200 138)                                 item["to_path"] = to_brain[0].getPath()
52acc20d src/collective/exportimport/export_other.py         (Thibaut Born      2021-11-29 13:40:14 +0100 139)                             item = self.reference_hook(item)
280c5cf9 src/collective/exportimport/export_other.py         (Thibaut Born      2021-11-30 11:20:42 +0100 140)                             if item is None:
280c5cf9 src/collective/exportimport/export_other.py         (Thibaut Born      2021-11-30 11:20:42 +0100 141)                                 continue
2a0ff06f src/collective/exportimport/export_other.py         (Philip Bauer      2021-06-03 14:55:24 +0200 142)                             results.append(item)

Exporting portlet with relation field fails

I have several relation fields in a portlet. Exporting portlets then fails because a RelationValue is not json serialisable:

http://localhost:9152/plone/@@export_portlets
Traceback (innermost last):
  Module ZPublisher.Publish, line 138, in publish
  Module ZPublisher.mapply, line 77, in mapply
  Module ZPublisher.Publish, line 48, in call_object
  Module collective.exportimport.export_other, line 477, in __call__
  Module json, line 251, in dumps
  Module json.encoder, line 209, in encode
  Module json.encoder, line 431, in _iterencode
  Module json.encoder, line 332, in _iterencode_list
  Module json.encoder, line 408, in _iterencode_dict
  Module json.encoder, line 408, in _iterencode_dict
  Module json.encoder, line 332, in _iterencode_list
  Module json.encoder, line 408, in _iterencode_dict
  Module json.encoder, line 408, in _iterencode_dict
  Module json.encoder, line 442, in _iterencode
  Module json.encoder, line 184, in default
TypeError: <z3c.relationfield.relation.RelationValue object at 0x114132050> is not JSON serializable

A bit related Is this comment from Philip where he removes some relations code, although I guess this was only active when exporting content, and not portlets.

The following diff in the portlet export code fixes it for me:

$ git diff
diff --git a/src/collective/exportimport/export_other.py b/src/collective/exportimport/export_other.py
index f358a1c..383635a 100644
--- a/src/collective/exportimport/export_other.py
+++ b/src/collective/exportimport/export_other.py
@@ -535,13 +535,18 @@ def export_local_portlets(obj):
             settings = IPortletAssignmentSettings(assignment)
             if manager_name not in items:
                 items[manager_name] = []
+            from z3c.relationfield.relation import RelationValue
+            assignment_data = {}
+            for name in schema.names():
+                value = getattr(assignment, name, None)
+                if value and isinstance(value, RelationValue):
+                    value = value.to_object.UID()
+                assignment_data[name] = value
+
             items[manager_name].append({
                 'type': portlet_type,
                 'visible': settings.get('visible', True),
-                'assignment': {
-                    name: getattr(assignment, name, None)
-                    for name in schema.names()
-                },
+                'assignment': assignment_data,
             })
     return items

The code needs to be more robust, but those are details.
I am not sure if this is a reasonable place for this fix or if there is a more general place.

Ah, wait, using this works too:

json_compatible(getattr(assignment, name, None))

At least then you get an export without errors, although my earlier code that returns uuids could be preferable in some cases.

Include more migrations in default content migration

Some export/imports that do not rely on other content (that might not yet exists at the time of importing) could be included in the default export/import of content. One example is constrains which is implemented like that in #71

Other options are:

  • local roles
  • discussions/comments
  • portlets - I think portlets are safe to include because they can't hold relations or binary data, right?

We could add hooks like item = self.export_constrain(item, obj) and self.import_constrain(obj, item) for each to make it easy to override. We could also add checkboxes so enable/disable these extra-steps during export.

In client projects I also export/import some marker-interfaces and annotations but I don't think that could be generalized.
It would probably be enough to add some more dokumentation with simple examples how that do that.

On import ModifiedDate datestring its timezone is not recognised

Copying this issue from what I wrote on community.plone.org to keep the list of possible switched/fixes central on github.

While importing an exported contenttype from Plone 4/AT to Plone 5.2 DX, the import routine gives a traceback on:

modified_data = datetime.strptime(modified, "%Y-%m-%dT%H:%M:%S%z")

The problem is the timezone part of the date string the exported modified date is u'2011-10-13T13:49:57+01:00' with a : separator in the timezone. strptime on Python 3 expects 0100. My work around so far has been simple and is on https://github.com/collective/collective.exportimport/tree/modified_date_parser . Use dateutil to parse the string on import.

However:

  • dateutil its parser is advanced magic, you can't/don't even pass a format string.
  • Should we maybe fix this on export of Archetypes content instead of enhancing the import
  • Are there maybe other small deviations in the date export and is it depended on locale or just a difference between AT/DX serialisers?

If this is specific to AT, this fix could be added/combined with other small tweaks to make the AT export more compatible with DX import and put behind a checkbox/switch as discussed in #7.

Feature request: exclude path

Just as content export has a "start from this path" input field, there should be a "do not recurse in the following paths" field (perhaps a multiline field / list) that makes the content exporter iterator completely skip anything below those paths. For many use cases, this is not necessary since the exported JSON can be filtered to exclude said content, but there are cases for very large sites where the user would rather not have to wait until everything is exported.

Concomitant to this, perhaps it would not be such a bad idea to allow for exports of multiple roots rather than a single one. Right now I have four Plone sites in a single Zope instance, and I have to export them all individually rather than at once. It's a bit of a bummer, frankly (and I also could never get the plone api get view thingie to work in scripts — when I do it that way, no objects are exported).

Handle site root object default page

Currently, we do not export/import the default page of the site root. I propose to use an empty string for the "uuid" dict value to denote the default page of the site root, as a special case.

Package is not installable with pip

Trying to install collective.exportimport with pip does not work, raising the error
ERROR: Package 'collective.exportimport' requires a different Python: 3.8.1 not in '==2.7, >=3.6'

The issue is in the python_requires specification, and can be seen with:

pip install packaging

python
>>> from packaging.specifiers import SpecifierSet
>>> from packaging.version import Version
>>> Version("3.8.1") in SpecifierSet("==2.7, >=3.6")
False
>>> Version("2.7.10") in SpecifierSet("==2.7, >=3.6")
False

The PEP 440 specify "The comma (",") is equivalent to a logical and operator: a candidate version must match all given version clauses in order to match the specifier as a whole."

Relations import should use plone.api.relation when available

Currently collective.relationhelpers is used.
When I add this package in a Plone 6 site, startup fails with a configuration conflict:

zope.configuration.config.ConfigurationConflictError: Conflicting configuration actions
  For: ('view', (<InterfaceClass Products.CMFPlone.interfaces.siteroot.IPloneSiteRoot>, <InterfaceClass zope.publisher.interfaces.browser.IDefaultBrowserLayer>), 'inspect-relations', <InterfaceClass zope.publisher.interfaces.browser.IBrowserRequest>)
    File "/Users/maurits/shared-eggs/cp39/Products.CMFPlone-6.0.0a1.dev0-py3.9.egg/Products/CMFPlone/controlpanel/browser/configure.zcml", line 326.2-332.8
        <browser:page
            name="inspect-relations"
            for="Products.CMFPlone.interfaces.IPloneSiteRoot"
            class=".relations.RelationsInspectControlpanel"
            template="relations_inspect.pt"
            permission="Products.CMFPlone.InspectRelations"
            />
    File "/Users/maurits/shared-eggs/cp39/collective.relationhelpers-1.5-py3.9.egg/collective/relationhelpers/configure.zcml", line 9.2-15.8
        <browser:page
            name="inspect-relations"
            for="Products.CMFPlone.interfaces.IPloneSiteRoot"
            class=".api.InspectRelationsControlpanel"
            template="relations_inspect.pt"
            permission="cmf.ManagePortal"
            />
  For: ('view', (<InterfaceClass Products.CMFPlone.interfaces.siteroot.IPloneSiteRoot>, <InterfaceClass zope.publisher.interfaces.browser.IDefaultBrowserLayer>), 'rebuild-relations', <InterfaceClass zope.publisher.interfaces.browser.IBrowserRequest>)
    File "/Users/maurits/shared-eggs/cp39/Products.CMFPlone-6.0.0a1.dev0-py3.9.egg/Products/CMFPlone/controlpanel/browser/configure.zcml", line 334.2-340.8
        <browser:page
            name="rebuild-relations"
            for="Products.CMFPlone.interfaces.IPloneSiteRoot"
            class=".relations.RelationsRebuildControlpanel"
            template="relations_rebuild.pt"
            permission="cmf.ManagePortal"
            />
    File "/Users/maurits/shared-eggs/cp39/collective.relationhelpers-1.5-py3.9.egg/collective/relationhelpers/configure.zcml", line 17.2-23.8
        <browser:page
            name="rebuild-relations"
            for="Products.CMFPlone.interfaces.IPloneSiteRoot"
            class=".api.RebuildRelationsControlpanel"
            template="relations_rebuild.pt"
            permission="cmf.ManagePortal"
            />

Alternatively, collective.relationhelpers should be fixed to not fail in this case.

Ignoring some portlets during import

I have an portlet export that includes a portlet that I no longer want. Currently, the import fails, because the portlet type is not known:

Traceback (innermost last):
  Module ZPublisher.WSGIPublisher, line 167, in transaction_pubevents
  Module ZPublisher.WSGIPublisher, line 376, in publish_module
  Module ZPublisher.WSGIPublisher, line 271, in publish
  Module ZPublisher.mapply, line 85, in mapply
  Module ZPublisher.WSGIPublisher, line 68, in call_object
  Module collective.exportimport.import_other, line 582, in __call__
  Module collective.exportimport.import_other, line 597, in import_portlets
  Module collective.exportimport.import_other, line 618, in register_portlets
  Module zope.component._api, line 165, in getUtility
zope.interface.interfaces.ComponentLookupError:
(<InterfaceClass zope.component.interfaces.IFactory>, 'collective.quickupload.QuickUploadPortlet')

It would be nice if the importer could ignore it.
I see two options:

  1. Catch the ComponentLookupError from above, log a warning, and continue with the next portlet.
  2. Have a list of portlet types that must be ignored. Empty by default, but easily monkey patched.

I prefer the first one, as it is a small change and works for everyone. But maybe we prefer that integrators explicitly catch the error. For the second one I now have this monkey patch, from which I could make a PR.

from collective.exportimport import import_other

import logging


logger = logging.getLogger(__name__)
_orig_register_portlets = import_other.register_portlets
IGNORE_PORTLET_TYPES = [
    "collective.quickupload.QuickUploadPortlet",
]


def register_portlets(obj, item):
    """Register portlets for one object.

    CHANGE compared to original: pop unwanted portlets.

    I tried to override the browser view first,
    but then I would have had to copy the template.
    """
    for manager_name, portlets in item.get("portlets", {}).items():
        if not portlets:
            continue
        ignore = []
        for portlet_data in portlets:
            if portlet_data["type"] in IGNORE_PORTLET_TYPES:
                logger.info(
                    "Ignoring portlet type %s at %s",
                    portlet_data["type"],
                    obj.absolute_url(),
                )
                ignore.append(portlet_data)
        for portlet_data in ignore:
            portlets.remove(portlet_data)
    return _orig_register_portlets(obj, item)


import_other.register_portlets = register_portlets
logger.info(
    "Patched collective.exportimport register_portlets to ignore these types: %s",
    IGNORE_PORTLET_TYPES,
)

Do we have a preference?

ATField (without base64 data) -> DXField: data missing

Somehow I do not get the trick. If I export my ATFileAttachment without base64 data it looks like this:

Bildschirmfoto 2021-05-27 um 09 25 15

Now importing this with a custom_dict_hook ("ATFileAttachment" -> "DXFile") creates a File object as expected, but the file field has not data 😢 ... so if I'm correct, the plone.restapi deserializer doesn't load data from remote origins? do I have to take care about this myself?

@pbauer @fredvd can anyone give me a hint on this?

Using JSONL as an alternative migration format

Exported JSON files can be become rather large in particular with inline binary data (which is often needed rather than having a reference to a blob file). Using JSONL would improve the handling of exports a lot. In particular, you could filter JSON records more easily using command line tools like grep.

Component conversation_view not available when executing view export_content() from the Plone CLI

I hae a snippet of code that is supposed to export (and it works with all exports other than content), but when I try it with content export, I get a zero-bytes JSON file. This snippet of code runs as an entry point of my bin/client program.

Does anyone know what the problem with the export is?

Code: https://gist.github.com/Rudd-O/f46154c80eb9937ec387e2b460ebbe8b

EDIT:

Ultimately the goal is to be able to command an export of everything via the Plone CLI (bin/client -c export.py), primarily for (but not limited to) migration automation and testing. All other exports work correctly using this code — only the content one does not.

Installation instruction question

Hi from Plone Conf!

In the Installation header, the README says You don't need to install the add-on. after instructions for installing through buildout. There does not seem to be any information about using collective.exportimport without installation. How would collective.exportimport be used without installation? I am personally interested in exporting a complete Plone 4 site to compare results with jsonify.

commit by @pbauer

Feature request: import / export content rules

Plone content rules are stored in an object IRuleStorage which is not visible in Zope space and is also not visible in content space (it can only be obtained using getUtility()). Implementation-wise, IRuleStorage is implemented as a simple OOBTree(), that can be acquired (plone.app.contentrules.browser.assignments.acquired_rules() has the scoop).

It would be great if the import/export framework had a step to deal with the rule storage. This would require at least a custom pair of serializer/deserializer.

error on bin/buildout on plone 4.2.1

Hello, i'm tryng to install on plone 4.2.1 but i have this error
is this compatibile with 4.2.1? thanks


root@intranet2:/opt/buildout/Plone4.2.1Intranet/zeocluster#  ./bin/buildout
Updating zeoserver.
Installing client1.
Getting distribution for 'hurry.filesize'.
Got hurry.filesize 0.9.
Getting distribution for 'collective.exportimport'.
/opt/buildout/Plone4.2.1Intranet/Python-2.7/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'project_urls'
  warnings.warn(msg)
/opt/buildout/Plone4.2.1Intranet/Python-2.7/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
  warnings.warn(msg)
error: Setup script exited with error in collective.exportimport setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers
An error occured when trying to install collective.exportimport main. Look above this message for any errors that were output by easy_install.
While:
  Installing client1.
  Getting distribution for 'collective.exportimport'.
Error: Couldn't install: collective.exportimport main
*************** PICKED VERSIONS ****************
[versions]
Products.LinguaPlone = 4.1.2
hurry.filesize = 0.9

*************** /PICKED VERSIONS ***************

more control over content exported (via query widget)

problem

  • currently can only select by type or single path (+ depth)

Solution

  • single query widget to replace path, depth and type
  • have a default query with all types and root folder selected to make it quicker to adjust
  • show count/preview of selected content. Maybe counts of all the content types selected?
  • some kind of adapter to make it easy to insert specific options if another exporter is added?
-------------------------------------------------------------
| Warning: multiple exports selected. Download will be tar.gz |
-------------------------------------------------------------

# Exports
[x] Content
[x] File/Images
[  ] Users
[  ] Content Tree
[  ] Relations
[  ] Translations
[  ] Local Roles
[  ] Default Page Mapping
[  ] Object Positions in Parent
[  ] Comments

# Content Export

{query widget}
Type: Page, News Item
Path: /news depth:1
Path: /other-news depth:1
Creation date: > 1/1/20018

-----------------------------
Selected content (21 items, 9Mb)
- News Item (10, 5Mb)
- Page (9, 2Mb)
- Folders (implicit) (2, 1Mb)
-----------------------------

# File/Images Export

(o) url/path
(  ) binary in tar.gz 
(  ) base64 encoded in json

[Download] [Save to Server] [Dry Run] [Cancel]

Alternatives considered

  • it would be more intuitive to put the options for each exporter indented directly under its checkbox.
    like "Conditionally revealing a related question in GDS"
  • Query widget should probably move to the top since its used for almost all exporters?
  • show list of content rather than counts by type? or maybe counts by path?

Relations import, isReferencing is dropped in the target site after being filled by import_content?

Something funny is going on with the relations import.

  • I think I forgot to import relations in my target Plone 5.2 site. When I use @@inspect_relations from collective.relationhelpers I see around 1800 isReferencing relations in the target site.
  • So I exported relations again from the source Plone 4 site, import them in the target site.
  • Now my 1800 isReferencing relations are gone, but 800 relatesTo (related items) relations have been added.

In import_other isReferencing is added to the ignores list:

ignore = [
"translationOf", # old LinguaPlone
"isReferencing", # linkintegrity
"internal_references", # obsolete

From the rest of the code in this method it seems all existing relations in the target site are removed and only the 'sanitised' relations are imported again. but this destroys all linkintegrity relations.

I assume the import_content imports are recreating the isReferencing relations while restoring the content items.

The most obvious fix would be to not drop the relations

in:

for rel in data:
if rel["relationship"] in ignore:
continue
rel["from_attribute"] = self.get_from_attribute(rel)
all_fixed_relations.append(rel)
all_fixed_relations = sorted(
all_fixed_relations, key=itemgetter("from_uuid", "from_attribute")
)
relapi.purge_relations()
relapi.cleanup_intids()
relapi.restore_relations(all_relations=all_fixed_relations)

But this does not work, I get ObjectMissing Errors in z3c.relationfield.event.updateRelations where it tries to list existing relations:

[9] > /Users/fred/.buildout/eggs/cp38/z3c.relationfield-0.9.0-py3.8.egg/z3c/relationfield/event.py(81)updateRelations()
-> rels = list(catalog.findRelations({'from_id': obj_id}))
[10]   /Users/fred/.buildout/eggs/cp38/zc.relation-1.1.post2-py3.8.egg/zc/relation/catalog.py(734)<genexpr>()
-> return (self._relTools['load'](t, self, cache) for t in tokens)
[11]   /Users/fred/.buildout/eggs/cp38/z3c.relationfield-0.9.0-py3.8.egg/z3c/relationfield/index.py(49)load()
-> return intids.getObject(token)
[12]   /Users/fred/.buildout/eggs/cp38/zope.intid-4.3.0-py3.8.egg/zope/intid/__init__.py(89)getObject()
-> raise ObjectMissingError(id)

Stranger is that when I step back in the debugger to frame 9 and execute the same line again I do get the rels list:

(Pdb++) list(catalog.findRelations({'from_id': obj_id}))
[<z3c.relationfield.relation.RelationValue object at 0x11eefdac0 oid 0x200b0 in <Connection at 10d0bb760>>]

Or I could remove isReferencing from the ignores list

but then I'm importing the Plone 4 Archetypes linkintegrity lists on recreated Dexterity Content. It could work as no paths/ids etc have changed and for non existing content the relations will be dropped. But it doesn't feel very clean to do this.

[edit:] I tried this and the isReferencing is actually increasing when I include isReferencing relations on import. So importing content creates 1848 isReferencing relatins, then restoing relations ups isReferencing to 1941 relations. :-O

Export existing relations on the target site and merge them with the imported relations.

Meh. I mean doing this in the import_code: make an extra import on the fly, merge with the incoming relations.json and reapply, dropping isReferencing only from the relations.json .

@pbauer How did you deal with this so far while using collective.exportimport?

Small improvement: add z3c.relationfield, plone.app.contenttypes in install_requires/setup.py

Using a really minimal Plone 4 installation (https://github.com/collective/minimalplone4/) we get these errors when adding collective.exportimport to eggs:

collective.exportimport-1.0-py2.7.egg/collective/exportimport/configure.zcml", line 13.2-19.8
    ImportError: No module named relationfield.interfaces

collective.exportimport-1.0-py2.7.egg/collective/exportimport/configure.zcml", line 57.2-58.52
    ImportError: No module named contenttypes.interfaces

Funny that this wasn't caught on tests.

Needed other pins as well, but is out of the scope of this project (will put them here for the sake of documentation):

# Python 2/Plone 4 compatibility.                                                          
plone.restapi = 6.13.8
PyJWT = 1.7.1                                                                      
# https://github.com/Julian/jsonschema/issues/453                                  
# Getting distribution for 'pyrsistent>=0.14.0'                                    
# ValueError: need more than 0 values to unpack                                    
jsonschema = 2.6.0

topic sort criteria do not get added to the json on export

When testing the topic to collection migration we noticed to sort_on and sort_reversedmetadata seems to not get migrated.
it seems like it is just forgotten to add that to the metadata.

With a quick look it seems that it might be fixed by replacing

    self._collection_sort_reversed = criterion.getReversed()
    self._collection_sort_on = criterion.Field()

with

topic_metadata["sort_reversed"] = criterion.getReversed()
topic_metadata["sort_on"]  = criterion.Field()

in this file:
https://github.com/collective/collective.exportimport/blob/main/src/collective/exportimport/serializer.py#L341

or adding it below the
topic_metadata["query"] = json_compatible(formquery) on line 361

related: https://github.com/plone/plone.app.contenttypes/blob/1.1.1/plone/app/contenttypes/migration/topics.py#L505

Let me know if this seems correct to you, then we will make a PR

"childen"

The export content page has a typo, it should say children, it says childen.

Export and import content revisions

I plan to implement the export and import of the full revision-history created bt CMFEditions.
I'm still undecided if it should be a option step during the default migration or a additional export/import step. Maybe the later to be able to limit the number of exported and imported revisions.

AttributeError: modification_date_migrated while resetting dates

Traceback (most recent call last):
  File "/home/user/optplone/deployments/601a/parts/client1/bin/interpreter", line 294, in <module>
    exec(_val)
  File "<string>", line 1, in <module>
  File "/home/user/optplone/deployments/601a/src/ruddocom.policy/src/ruddocom/policy/ctl/import_embedded.py", line 105, in <module>
    full_import(site, importpath, what)
  File "/home/user/optplone/deployments/601a/src/ruddocom.policy/src/ruddocom/policy/ctl/import_embedded.py", line 85, in full_import
    fixer()
  File "/home/user/optplone/deployments/601a/src/collective.exportimport/src/collective/exportimport/import_content.py", line 697, in __call__
    portal.ZopeFindAndApply(portal, search_sub=True, apply_func=reset_dates)
  File "/home/user/optplone/buildout-cache/eggs/Zope-5.3-py3.8.egg/OFS/FindSupport.py", line 171, in ZopeFindAndApply
    self.ZopeFindAndApply(ob, obj_ids, obj_metatypes,
  File "/home/user/optplone/buildout-cache/eggs/Zope-5.3-py3.8.egg/OFS/FindSupport.py", line 171, in ZopeFindAndApply
    self.ZopeFindAndApply(ob, obj_ids, obj_metatypes,
  File "/home/user/optplone/buildout-cache/eggs/Zope-5.3-py3.8.egg/OFS/FindSupport.py", line 165, in ZopeFindAndApply
    apply_func(ob, (apply_path + '/' + p))
  File "/home/user/optplone/deployments/601a/src/collective.exportimport/src/collective/exportimport/import_content.py", line 688, in reset_dates
    del obj.modification_date_migrated
AttributeError: modification_date_migrated

Why is this happening?

plone.formwidget.geolocation.geolocation.Geolocation: No converter for making ....JSON compatible

Not a bug, perhaps a feature request for supporting plone.formwidget.geolocation.geolocation.Geolocation:

2021-12-11 18:05:41,976 INFO    [collective.exportimport.export_content:299][waitress-2] Error exporting https://test.dynamore.de/en/locations/subsidiaries/dynamore-swiss-en: No converter for making <plone.formwidget.geolocation.geolocation.Geolocation object at 0x7fed111cff28> (<class 'plone.formwidget.geolocation.geolocation.Geolocation'>) JSON compatible

Default page for site root not imported

All my default pages import, except for the default page of the plone site root. I notice in defaultpages.json that there is an object with UID plone_site_root, but this object ID does not appear in the content.json file (the actual plone site root has a concrete UID.

It does not look like the defaultpages routine that looks for the site root actually links the default page with the site root.

What gives?

KeyError: 'parent' while exporting discussion comments

Plone 4.3.20 (AT) with current checkout from master generates these errors for discussion items:

2022-04-12T16:07:55 ERROR collective.exportimport.export_content Error exporting http://dev2.zopyx.de:5080/eteaching/community/communityevents/ringvorlesung/hybride-lehrszenarien-gestalten/++conversation++default/1602677734552386
Traceback (most recent call last):
  File "/home/ajung/sandboxes/iwm/plone4.buildout/src/collective.exportimport/src/collective/exportimport/export_content.py", line 284, in export_content
    item = self.fix_url(item, obj)
  File "/home/ajung/sandboxes/iwm/plone4.buildout/src/collective.exportimport/src/collective/exportimport/export_content.py", line 428, in fix_url
    if item["parent"]["@id"] != parent_url:
KeyError: 'parent'

Add export/import for default_page

When properly setting default-pages with setDefaultPage('foo') the target object foo has to exist and foo is also reindexed with the index is_default_page. So this is a task that needs to be handles after content migration same as relations.
A side-benefit is that you can skip this part when you want to migrate to a Plone 6 site with Volto (because Volto does not support default-pages).

Output src/dest paths on relations.json to improve logging of failed imported relations

If the now optional with debug=1 addition of paths on the relations export is always done:

if self.debug:
item["from_path"] = source.absolute_url_path()
item["to_path"] = target.absolute_url_path()
results.append(item)

we can provide much better logging on the import step. Now it only logs

2021-05-20 14:30:29,877 INFO [collective.relationhelpers.api:237][waitress-2] 7b02d3b74f57cd1d4a1c782fd74dd649 is missing

Collection queries with old query param

When exporting ATCollection items the queries might contain outdated configurations. The following query doesn’t work in Plone 5 with plone.app.querystring > 1.3.2 (https://github.com/plone/plone.app.querystring/blob/master/CHANGES.rst#1312-2015-11-26)

        "query": [
            {
                "i": "portal_type", 
                "o": "plone.app.querystring.operation.selection.is", 
                "v": [
                    "News Item"
                ]
            }, 
            {
                "i": "path", 
                "o": "plone.app.querystring.operation.string.relativePath", 
                "v": "../"
            }
        ],

The correct version would be:

        "query": [
            {
                "i": "portal_type", 
                "o": "plone.app.querystring.operation.selection.any", 
                "v": [
                    "News Item"
                ]
            }, 
            {
                "i": "path", 
                "o": "plone.app.querystring.operation.string.relativePath", 
                "v": "../"
            }
        ],

With plone.app.querystring there is an upgrade step available (8 -> 9) which you can run to fix imported collections with the wrong queries.

Besides that, should we add some logic to adjust the queries on export/import?

Export creator information

Currently, content is exported without creator information. There are cases, however, when that information is needed and it is not sufficient to have imported objects created by the importing user. Maybe it would be useful to export creator information optionally, but if so, I'm not sure about whether it should be done by default.

Importing richtext via restapi unescapes escaped html-entities

In plone.restapi deserializing richtext uses html_parser.unescape(data) before setting the RichTextValue. See https://github.com/plone/plone.restapi/blob/master/src/plone/restapi/deserializer/dxfields.py#L292

This leads to broken content because code-examples are transformed to html-tags: <pre>Code example: &lt;h2&gt;Heading 2&lt;/h2&gt; example</pre> becomes <pre>Code example: <h2>Heading 2</h2> example</pre>

I'm not sure if this is a bug in restapi but for the purpose of exportimport I will override the RichTextFieldDeserializer with a version that does not do that for now.

Exporting relations sometimes fails with certain content objects that cannot be adapted (on crufty ZODBs)

Here is a diff that allows the program to continue.

diff --git a/src/collective/exportimport/export_other.py b/src/collective/exportimport/export_other.py
index 2cf8e33..201f7e7 100644
--- a/src/collective/exportimport/export_other.py
+++ b/src/collective/exportimport/export_other.py
@@ -117,7 +117,12 @@ class ExportRelations(BrowserView):
             if relation_catalog:
                 portal_catalog = getToolByName(self.context, "portal_catalog")
                 for rel in relation_catalog.findRelations():
-                    if rel.from_path and rel.to_path:
+                    try:
+                        rel_from_path_and_rel_to_path = rel.from_path and rel.to_path
+                    except ValueError:
+                        logger.exception("Cannot export relation %s, skipping", rel)
+                        continue
+                    if rel_from_path_and_rel_to_path:
                         from_brain = portal_catalog(
                             path=dict(query=rel.from_path, depth=0)
                         )

Imported portlet fails

With this data:

[
    {
        "portlets": {
            "plone.footerportlets": [
                {
                    "type": "plone.portlet.static.Static",
                    "visible": true,
                    "assignment": {
                        "header": "Want to comment on this?",
                        "text": {
                            "data": "<p style=\"text-align: center;\"><em>Want to comment on this?\u00a0 </em><a rel=\"noopener\" target=\"_blank\" href=\"https://t.me/Site_com\" data-linktype=\"external\" data-val=\"https://t.me/Site_com\"><strong>Join our Telegram</strong> channel</a> and share your opinions in each post!</p>",
                            "content-type": "text/html",
                            "encoding": "utf-8"
                        },
                        "omit_border": true,
                        "footer": null,
                        "more_url": null
                    }
                }
            ]
        },
        "uuid": "1bb9868bd9de4637977348acab5d1893"
    },
[...]

the imported portlet won't render, and editing shows this:

We’re sorry, but there seems to be an error…

Here is the full error message:

Traceback (innermost last):
  Module ZPublisher.WSGIPublisher, line 167, in transaction_pubevents
  Module ZPublisher.WSGIPublisher, line 376, in publish_module
  Module ZPublisher.WSGIPublisher, line 271, in publish
  Module ZPublisher.mapply, line 85, in mapply
  Module ZPublisher.WSGIPublisher, line 68, in call_object
  Module plone.app.portlets.browser.formhelper, line 177, in __call__
  Module z3c.form.form, line 233, in __call__
  Module plone.z3cform.fieldsets.extensible, line 65, in update
  Module plone.z3cform.patch, line 30, in GroupForm_update
  Module z3c.form.group, line 132, in update
  Module z3c.form.form, line 136, in updateWidgets
  Module z3c.form.field, line 277, in update
  Module plone.app.textfield.widget, line 42, in update
  Module z3c.form.browser.textarea, line 37, in update
  Module z3c.form.browser.widget, line 171, in update
  Module Products.CMFPlone.patches.z3c_form, line 46, in _wrapped
  Module z3c.form.widget, line 132, in update
  Module plone.app.textfield.widget, line 99, in toWidgetValue
ValueError: Can not convert {'data': '<p style="text-align: center;"><em>Want to comment on this?\xa0 </em><a rel="noopener" target="_blank" href="https://t.me/RuddO_com" data-linktype="external" data-val="https://t.me/RuddO_com"><strong>Join our Telegram</strong> channel</a> and share your opinions in each post!</p>', 'content-type': 'text/html', 'encoding': 'utf-8'} to an IRichTextValue

Looks like the text object is being passed in lieu of the text subobject. Something is wrong with the deserializer.

On import DX-deserializer fails if expires is set before effective

In old Plone sites this is apparently no problem to set, but when we call the DX deserializer on an imported item where effective > expires, the import is aborted with a ValidationError:

orig_error is my local patch in plone.restapi to see the real message when I get dropped in to a PDB (PDBDebugMode)


*** zExceptions.BadRequest: [{'error': 'ValidationError', 
'message': 'error_expiration_must_be_after_effective_date', 
'orig_error': EffectiveAfterExpires('error_expiration_must_be_after_effective_date'),
'id': 'http://localhost:9050/plone/nl/kalender/asfasdfdsfasdf.pdf', 

'orig_data': {'@id': 'http://localhost:9050/plone/nl/kalender/asdfasdfsdfads.pdf', '@type': 'File', 'UID': '5397b307c9cc784319fca831cb0994d9', 'allow_discussion': False, 'contributors': [], 'created': '2015-09-07T12:40:17+00:00', 'creators': ['asdfasfdf'], 'description': None, 'effective': '2015-09-07T12:40:00+00:00', 'exclude_from_nav': True, 'expires': '2014-02-01T00:00:00+00:00',

Export/import workflow_history

I plan to add support for export/import of workflow history. It will probably be straightforward and part of the default process.

Export part of a production site?

Hi, sorry for not having a closer look first, but if it is possible, then all is good, if it is still not possible, but desirable, I might contribute the missing code if such a use case would be liked:

So my use case would be to be able to export (and re-import) a part of the website. The main idea is to export anything that is linked from a few overview pages and be able to re-import that on an empty plone site. This would allow for example our designer to work with real content but not have to work with the 1M objects we have on our database 🙂

Is that already possible? 🤔 would it be ok to provide such a functionality here in this package?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.