Git Product home page Git Product logo

collective.exportimport's Introduction

Latest Version

Egg Status

image

License

collective.exportimport

Export and import content, members, relations, translations, localroles and much more.

Export and import all kinds of data from and to Plone sites using a intermediate json-format. The main use-case is migrations since it enables you to for example migrate from Plone 4 with Archetypes and Python 2 to Plone 6 with Dexterity and Python 3 in one step. Most features use plone.restapi to serialize and deserialize data.

See also the training on migrating with exportimport: https://training.plone.org/migrations/exportimport.html

Contents

Features

  • Export & Import content
  • Export & Import members and groups with their roles
  • Export & Import relations
  • Export & Import translations
  • Export & Import local roles
  • Export & Import order (position in parent)
  • Export & Import discussions/comments
  • Export & Import versioned content
  • Export & Import redirects

Export supports:

  • Plone 4, 5 and 6
  • Archetypes and Dexterity
  • Python 2 and 3
  • plone.app.multilingual, Products.LinguaPlone, raptus.multilanguagefields

Import supports:

  • Plone 5.2+, Dexterity, Python 2 and 3, plone.app.multilingual

Installation

Install collective.exportimport as you would install any other Python package.

You don't need to activate the add-on in the Site Setup Add-ons control panel to be able to use the forms @@export_content and @@import_content in your site.

If you need help, see: - for Plone 4: https://4.docs.plone.org/adapt-and-extend/install_add_ons.html - for Plone 5: https://5.docs.plone.org/manage/installing/installing_addons.html - for Plone 6: https://6.docs.plone.org/install/manage-add-ons-packages.html

Python 2 compatibility

This package is compatible with Python 3 and Python 2. Depending on the Python version different versions of it's dependencies will be installed. If you run into problems, file an issue at: https://github.com/collective/collective.exportimport/issues

Usage

Export

Use the form with the URL /@@export_content, and select what you want to export:

image

You can export one or more types and a whole site or only a specific path in a site. Since items are exported ordered by path importing them will create the same structure as you had originally.

The downloaded json-file will have the name of the path you exported from, e.g. Plone.json.

The exports for members, relations, localroles and relations are linked to in this form but can also be called individually: /@@export_members, /@@export_relations, /@@export_localroles, /@@export_translations, /@@export_ordering, /@@export_discussion.

Import

Use the form with the URL /@@import_content, and upload a json-file that you want to import:

image

The imports for members, relations, localroles and relations are linked to in this form but can also be called individually: /@@import_members, /@@import_relations, /@@import_localroles, /@@import_translations, /@@import_ordering, /@@import_discussion.

As a last step in a migration there is another view @@reset_dates that resets the modified date on imported content to the date initially contained in the imported json-file. This is necessary since varous changes during a migration will likely result in a updated modified-date. During import the original is stored as obj.modification_date_migrated on each new object and this view sets this date.

Export- and import locations

If you select 'Save to file on server', the Export view will save json files in the <var> directory of your Plone instanc in /var/instance. The import view will look for files under /var/instance/import. These directories will normally be different, under different Plone instances and possibly on different servers.

You can set the environment variable 'COLLECTIVE_EXPORTIMPORT_CENTRAL_DIRECTORY' to add a 'shared' directory on one server or maybe network share. With this variable set, collective.exportimport will both save to and load .json files from the same server directory. This saves time not having to move .json files around from the export- to the import location. You should be aware that the Export views will overwrite any existing previous .json file export that have the same name.

Use-cases

Migrations

When a in-place-migration is not required you can choose this add-on to migrate the most important parts of your site to json and then import it into a new Plone instance of your targeted version:

  • Export content from a Plone site (it supports Plone 4 and 5, Archetypes and Dexterity, Python 2 and 3).
  • Import the exported content into a new site (Plone 5.2+, Dexterity, Python 3)
  • Export and import relations, users and groups with their roles, translations, local roles, ordering, default-pages, comments, portlets and redirects.

How to migrate additional features like Annotations or Marker Interfaces is discussed in the FAQ section.

Other

You can use this add-on to

  • Archive your content as JSON.
  • Export data to prepare a migration to another system.
  • Combine content from multiple plone-sites into one.
  • Import a plone-site as a subsite into another.
  • Import content from other systems as long as it fits the required format.
  • Update or replace existing data.

Details

Export content

Exporting content is basically a wrapper for the serializers of plone.restapi:

from plone.restapi.interfaces import ISerializeToJson
from zope.component import getMultiAdapter

serializer = getMultiAdapter((obj, request), ISerializeToJson)
data = serializer(include_items=False)

Import content

Importing content is a elaborate wrapper for the deserializers of plone.restapi:

from plone.restapi.interfaces import IDeserializeFromJson
from zope.component import getMultiAdapter

container.invokeFactory(item['@type'], item['id'])
deserializer = getMultiAdapter((new, self.request), IDeserializeFromJson)
new = deserializer(validate_all=False, data=item)

Use for migrations

A main use-case of this package is migration from one Plone-Version to another.

Exporting Archetypes content and importing that as Dexterity content works fine but due to changes in field-names some settings would get lost. For example the setting to exclude content from the navigation was renamed from excludeFromNav to exclude_from_nav.

To fix this you can check the checkbox "Modify exported data for migrations". This will modify the data during export:

  • Drop unused data (e.g. next_item and components)
  • Remove all relation fields
  • Change some field names that changed between Archetypes and Dexterity
    • excludeFromNavexclude_from_nav
    • allowDiscussionallow_discussion
    • subjectsubjects
    • expirationDateexpires
    • effectiveDateeffective
    • creation_datecreated
    • modification_datemodified
    • startDatestart
    • endDateend
    • openEndopen_end
    • wholeDaywhole_day
    • contactEmailcontact_email
    • contactNamecontact_name
    • contactPhonecontact_phone
  • Update view names on Folders and Collection that changed since Plone 4.
  • Export ATTopic and their criteria to Collections with querystrings.
  • Update Collection-criteria.
  • Links and images in Richtext-Fields of content and portlets have changes since Plone 4. the view /@@fix_html allows you to fix these.

Control creating imported content

You can choose between four options how to deal with content that already exists:

  • Skip: Don't import at all
  • Replace: Delete item and create new
  • Update: Reuse and only overwrite imported data
  • Ignore: Create with a new id

Imported content is initially created with invokeFactory using portal_type and id of the exported item before deserializing the rest of the data. You can set additional values by specifying a dict factory_kwargs that will be passed to the factory. Like this you can set values on the imported object that are expected to be there by subscribers to IObjectAddedEvent.

Export versioned content

Exporting versions of Archetypes content will not work because of a bug in plone.restapi (plone/plone.restapi#1335). For export to work you need to use a version between 7.7.0 and 8.0.0 (if released) or a source-checkout of the branch 7.x.x.

Notes on speed and large migrations

Exporting and importing large amounts of content can take a while. Export is pretty fast but import is constrained by some features of Plone, most importantly versioning:

  • Importing 5000 Folders takes ~5 minutes
  • Importing 5000 Documents takes >25 minutes because of versioning.
  • Importing 5000 Documents without versioning takes ~7 minutes.

During import you can commit every x number of items which will free up memory and disk-space in your TMPDIR (where blobs are added before each commit).

When exporting large numbers of blobs (binary files and images) you will get huge json-files and may run out of memory. You have various options to deal with this. The best way depends on how you are going to import the blobs:

  • Export as download urls: small download, but collective.exportimport cannot import the blobs, so you will need an own import script to download them.
  • Export as base-64 encoded strings: large download, but collective.exportimport can handle the import.
  • Export as blob paths: small download and collective.exportimport can handle the import, but you need to copy var/blobstorage to the Plone Site where you do the import or set the environment variable COLLECTIVE_EXPORTIMPORT_BLOB_HOME to the old blobstorage path: export COLLECTIVE_EXPORTIMPORT_BLOB_HOME=/path-to-old-instance/var/blobstorage. To export the blob-path you do not need to have access to the blobs!

Format of export and import of content

By default all content is exported to and imported from one large json-file. To inspect such very large json-files without performance-issues you can use klogg (https://klogg.filimonov.dev).

Since version 1.10 collective.exportimport also supports exporting and importing each content item as a separate json-file. To use that select Save each item as a separate file on the server in the form or specify download_to_server=2 when calling the export in python. In the import-form you can manually select a directory on the server or specify server_directory="/mydir" when calling the import in python.

Customize export and import

This add-on is designed to be adapted to your requirements and has multiple hooks to make that easy.

To make that easier here are packages you can reuse to override and extend the export and import. Use these templates and adapt them to your own projects:

Many examples for customizing the export and import are collected in the chapter "FAQ, Tips and Tricks" below.

Note

As a rule of thumb you should make changes to the data during import unless you need access to the original object for the required changes. One reason is that this way the serialized content in the json-file more closely represents the original data. Another reason is that it allows you to fix issues during the process you are currently developing (i.e. without having to redo the export).

Export Example

from collective.exportimport.export_content import ExportContent

class CustomExportContent(ExportContent):

    QUERY = {
        'Document': {'review_state': ['published', 'pending']},
    }

    DROP_PATHS = [
        '/Plone/userportal',
        '/Plone/en/obsolete_content',
    ]

    DROP_UIDS = [
        '71e3e0a6f06942fea36536fbed0f6c42',
    ]

    def update(self):
        """Use this to override stuff before the export starts
        (e.g. force a specific language in the request)."""

    def start(self):
        """Hook to do something before export."""

    def finish(self):
        """Hook to do something after export."""

    def global_obj_hook(self, obj):
        """Inspect the content item before serialisation data.
        Bad: Changing the content-item is a horrible idea.
        Good: Return None if you want to skip this particular object.
        """
        return obj

    def global_dict_hook(self, item, obj):
        """Use this to modify or skip the serialized data.
        Return None if you want to skip this particular object.
        """
        return item

    def dict_hook_document(self, item, obj):
        """Use this to modify or skip the serialized data by type.
        Return the modified dict (item) or None if you want to skip this particular object.
        """
        return item

Register it with your own browserlayer to override the default.

<browser:page
    name="export_content"
    for="zope.interface.Interface"
    class=".custom_export.CustomExportContent"
    layer="My.Custom.IBrowserlayer"
    permission="cmf.ManagePortal"
    />

Import Example

from collective.exportimport.import_content import ImportContent

class CustomImportContent(ImportContent):

    CONTAINER = {'Event': '/imported-events'}

    # These fields will be ignored
    DROP_FIELDS = ['relatedItems']

    # Items with these uid will be ignored
    DROP_UIDS = ['04d1477583c74552a7fcd81a9085c620']

    # These paths will be ignored
    DROP_PATHS = ['/Plone/doormat/', '/Plone/import_files/']

    # Default values for some fields
    DEFAULTS = {'which_price': 'normal'}

    def start(self):
        """Hook to do something before importing one file."""

    def finish(self):
        """Hook to do something after importing one file."""

    def global_dict_hook(self, item):
        if isinstance(item.get('description', None), dict):
            item['description'] = item['description']['data']
        if isinstance(item.get('rights', None), dict):
            item['rights'] = item['rights']['data']
        return item

    def dict_hook_customtype(self, item):
        # change the type
        item['@type'] = 'anothertype'
        # drop a field
        item.pop('experiences', None)
        return item

    def handle_file_container(self, item):
        """Use this to specify the container in which to create the item in.
        Return the container for this particular object.
        """
        return self.portal['imported_files']

Register it:

<browser:page
    name="import_content"
    for="zope.interface.Interface"
    class=".custom_import.CustomImportContent"
    layer="My.Custom.IBrowserlayer"
    permission="cmf.ManagePortal"
    />

Automate export and import

Run all exports and save all data in var/instance/:

from plone import api
from Products.Five import BrowserView

class ExportAll(BrowserView):

    def __call__(self):
        export_content = api.content.get_view("export_content", self.context, self.request)
        self.request.form["form.submitted"] = True
        export_content(
            portal_type=["Folder", "Document", "News Item", "File", "Image"],  # only export these
            include_blobs=2,  # Export files and images as blob paths
            download_to_server=True)

        other_exports = [
            "export_relations",
            "export_members",
            "export_translations",
            "export_localroles",
            "export_ordering",
            "export_defaultpages",
            "export_discussion",
            "export_portlets",
            "export_redirects",
        ]
        for name in other_exports:
            view = api.content.get_view(name, portal, request)
            # This saves each export in var/instance/export_xxx.json
            view(download_to_server=True)

        # Important! Redirect to prevent infinite export loop :)
        return self.request.response.redirect(self.context.absolute_url())

Run all imports using the data exported in the example above:

from collective.exportimport.fix_html import fix_html_in_content_fields
from collective.exportimport.fix_html import fix_html_in_portlets
from pathlib import Path
from plone import api
from Products.Five import BrowserView


class ImportAll(BrowserView):

    def __call__(self):
        portal = api.portal.get()

        # Import content
        view = api.content.get_view("import_content", portal, request)
        request.form["form.submitted"] = True
        request.form["commit"] = 500
        view(server_file="Plone.json", return_json=True)
        transaction.commit()

        # Run all other imports
        other_imports = [
            "relations",
            "members",
            "translations",
            "localroles",
            "ordering",
            "defaultpages",
            "discussion",
            "portlets",
            "redirects",
        ]
        cfg = getConfiguration()
        directory = Path(cfg.clienthome) / "import"
        for name in other_imports:
            view = api.content.get_view(f"import_{name}", portal, request)
            path = Path(directory) / f"export_{name}.json"
            results = view(jsonfile=path.read_text(), return_json=True)
            logger.info(results)
            transaction.commit()

        # Run cleanup steps
        results = fix_html_in_content_fields()
        logger.info("Fixed html for %s content items", results)
        transaction.commit()

        results = fix_html_in_portlets()
        logger.info("Fixed html for %s portlets", results)
        transaction.commit()

        reset_dates = api.content.get_view("reset_dates", portal, request)
        reset_dates()
        transaction.commit()

Note

The views @@export_all and @@import_all are also contained in the helper-packages https://github.com/starzel/contentexport and https://github.com/starzel/contentimport

FAQ, Tips and Tricks

This section covers frequent use-cases and examples for features that are not required for all migrations.

Using global_obj_hook during export

Using global_obj_hook during export to inspect content and decide to skip it.

def global_obj_hook(self, obj):
    # Drop subtopics
    if obj.portal_type == "Topic" and obj.__parent__.portal_type == "Topic":
        return

    # Drop files and images from PFG formfolders
    if obj.__parent__.portal_type == "FormFolder":
        return
    return obj

Using dict-hooks during export

Use global_dict_hook during export to inspect content and modify the serialized json. You can also use dict_hook_<somecontenttype> to better structure your code for readability.

Sometimes you need to handle data that you add in global_dict_hook during export in corresponding code in global_object_hook during import.

The following example about placeful workflow policy is a perfect example for that pattern:

Export/Import placeful workflow policy

Export:

def global_dict_hook(self, item, obj):
    if obj.isPrincipiaFolderish and ".wf_policy_config" in obj.keys():
        wf_policy = obj[".wf_policy_config"]
        item["exportimport.workflow_policy"] = {
            "workflow_policy_below": wf_policy.workflow_policy_below,
            "workflow_policy_in": wf_policy.workflow_policy_in,
        }
    return item

Import:

def global_obj_hook(self, obj, item):
    wf_policy = item.get("exportimport.workflow_policy")
    if wf_policy:
        obj.manage_addProduct["CMFPlacefulWorkflow"].manage_addWorkflowPolicyConfig()
        wf_policy_config = obj[".wf_policy_config"]
        wf_policy_config.setPolicyIn(wf_policy["workflow_policy_in"], update_security=True)
        wf_policy_config.setPolicyBelow(wf_policy["workflow_policy_below"], update_security=True)

Using dict-hooks during import

A lot of fixes can be done during import using the global_dict_hook or dict_hook_<contenttype>.

Here we prevent the expire-date to be before the effective date since that would lead to validation-errors during deserializing:

def global_dict_hook(self, item):
    effective = item.get('effective', None)
    expires = item.get('expires', None)
    if effective and expires and expires <= effective:
        item.pop('expires')
    return item

Here we drop empty lines from the creators:

def global_dict_hook(self, item):
    item["creators"] = [i for i in item.get("creators", []) if i]
    return item

This example migrates a PloneHelpCenter to a simple folder/document structure during import. There are a couple more types to handle (as folder or document) but you get the idea, don't you?

def dict_hook_helpcenter(self, item):
    item["@type"] = "Folder"
    item["layout"] = "listing_view"
    return item

def dict_hook_helpcenterglossary(self, item):
    item["@type"] = "Folder"
    item["layout"] = "listing_view"
    return item

def dict_hook_helpcenterinstructionalvideo(self, item):
    item["@type"] = "File"
    if item.get("video_file"):
        item["file"] = item["video_file"]
    return item

def dict_hook_helpcenterlink(self, item):
    item["@type"] = "Link"
    item["remoteUrl"] = item.get("url", None)
    return item

def dict_hook_helpcenterreferencemanualpage(self, item):
    item["@type"] = "Document"
    return item

If you change types during import you need to take care of other cases where types are referenced.Examples are collection-queries (see "Fixing invalid collection queries" below) or constrains (see here):

PORTAL_TYPE_MAPPING = {
    "Topic": "Collection",
    "FormFolder": "EasyForm",
    "HelpCenter": "Folder",
}

def global_dict_hook(self, item):
    if item.get("exportimport.constrains"):
        types_fixed = []
        for portal_type in item["exportimport.constrains"]["locally_allowed_types"]:
            if portal_type in PORTAL_TYPE_MAPPING:
                types_fixed.append(PORTAL_TYPE_MAPPING[portal_type])
            elif portal_type in ALLOWED_TYPES:
                types_fixed.append(portal_type)
        item["exportimport.constrains"]["locally_allowed_types"] = list(set(types_fixed))

        types_fixed = []
        for portal_type in item["exportimport.constrains"]["immediately_addable_types"]:
            if portal_type in PORTAL_TYPE_MAPPING:
                types_fixed.append(PORTAL_TYPE_MAPPING[portal_type])
            elif portal_type in ALLOWED_TYPES:
                types_fixed.append(portal_type)
        item["exportimport.constrains"]["immediately_addable_types"] = list(set(types_fixed))
    return item

Change workflow

REVIEW_STATE_MAPPING = {
    "internal": "published",
    "internally_published": "published",
    "obsolete": "private",
    "hidden": "private",
}

def global_dict_hook(self, item):
    if item.get("review_state") in REVIEW_STATE_MAPPING:
        item["review_state"] = REVIEW_STATE_MAPPING[item["review_state"]]
    return item

Export/Import Annotations

Some core-features of Plone (e.g. comments) use annotations to store data. The core features are already covered but your custom code or community add-ons may use annotations as well. Here is how you can migrate them.

Export: Only export those Annotations that your really need.

from zope.annotation.interfaces import IAnnotations
ANNOTATIONS_TO_EXPORT = [
    "syndication_settings",
]
ANNOTATIONS_KEY = 'exportimport.annotations'

class CustomExportContent(ExportContent):

    def global_dict_hook(self, item, obj):
        item = self.export_annotations(item, obj)
        return item

    def export_annotations(self, item, obj):
        results = {}
        annotations = IAnnotations(obj)
        for key in ANNOTATIONS_TO_EXPORT:
            data = annotations.get(key)
            if data:
                results[key] = IJsonCompatible(data, None)
        if results:
            item[ANNOTATIONS_KEY] = results
        return item

Import:

from zope.annotation.interfaces import IAnnotations
ANNOTATIONS_KEY = "exportimport.annotations"

class CustomImportContent(ImportContent):

    def global_obj_hook(self, obj, item):
        item = self.import_annotations(obj, item)
        return item

    def import_annotations(self, obj, item):
        annotations = IAnnotations(obj)
        for key in item.get(ANNOTATIONS_KEY, []):
            annotations[key] = item[ANNOTATIONS_KEY][key]
        return item

Some features also store data in annotations on the portal, e.g. plone.contentrules.localassignments, plone.portlets.categoryblackliststatus, plone.portlets.contextassignments, syndication_settings. Depending on your requirements you may want to export and import those as well.

Export/Import Marker Interfaces

Export: You may only want to export the marker-interfaces you need. It is a good idea to inspect a list of all used marker interfaces in a portal before deciding what to migrate.

from zope.interface import directlyProvidedBy

MARKER_INTERFACES_TO_EXPORT = [
    "collective.easyslider.interfaces.ISliderPage",
    "plone.app.layout.navigation.interfaces.INavigationRoot",
]
MARKER_INTERFACES_KEY = "exportimport.marker_interfaces"

class CustomExportContent(ExportContent):

    def global_dict_hook(self, item, obj):
        item = self.export_marker_interfaces(item, obj)
        return item

    def export_marker_interfaces(self, item, obj):
        interfaces = [i.__identifier__ for i in directlyProvidedBy(obj)]
        interfaces = [i for i in interfaces if i in MARKER_INTERFACES_TO_EXPORT]
        if interfaces:
            item[MARKER_INTERFACES_KEY] = interfaces
        return item

Import:

from plone.dexterity.utils import resolveDottedName
from zope.interface import alsoProvides

MARKER_INTERFACES_KEY = "exportimport.marker_interfaces"

class CustomImportContent(ImportContent):

    def global_obj_hook_before_deserializing(self, obj, item):
        """Apply marker interfaces before deserializing."""
        for iface_name in item.pop(MARKER_INTERFACES_KEY, []):
            try:
                iface = resolveDottedName(iface_name)
                if not iface.providedBy(obj):
                    alsoProvides(obj, iface)
                    logger.info("Applied marker interface %s to %s", iface_name, obj.absolute_url())
            except ModuleNotFoundError:
                pass
        return obj, item

Skip versioning during import

The event-handlers of versioning can seriously slow down your imports. It is a good idea to skip it before the import:

VERSIONED_TYPES = [
    "Document",
    "News Item",
    "Event",
    "Link",
]

def start(self):
    self.items_without_parent = []
    portal_types = api.portal.get_tool("portal_types")
    for portal_type in VERSIONED_TYPES:
        fti = portal_types.get(portal_type)
        behaviors = list(fti.behaviors)
        if 'plone.versioning' in behaviors:
            logger.info(f"Disable versioning for {portal_type}")
            behaviors.remove('plone.versioning')
        fti.behaviors = behaviors

Re-enable versioning and create initial versions after all imports and fixes are done, e.g in the view @@import_all.

from Products.CMFEditions.interfaces.IModifier import FileTooLargeToVersionError

VERSIONED_TYPES = [
    "Document",
    "News Item",
    "Event",
    "Link",
]

class ImportAll(BrowserView):

    # re-enable versioning
    portal_types = api.portal.get_tool("portal_types")
    for portal_type in VERSIONED_TYPES:
        fti = portal_types.get(portal_type)
        behaviors = list(fti.behaviors)
        if "plone.versioning" not in behaviors:
            behaviors.append("plone.versioning")
            logger.info(f"Enable versioning for {portal_type}")
        if "plone.locking" not in behaviors:
            behaviors.append("plone.locking")
            logger.info(f"Enable locking for {portal_type}")
        fti.behaviors = behaviors
    transaction.get().note("Re-enabled versioning")
    transaction.commit()

    # create initial version for all versioned types
    logger.info("Creating initial versions")
    portal_repository = api.portal.get_tool("portal_repository")
    brains = api.content.find(portal_type=VERSIONED_TYPES)
    total = len(brains)
    for index, brain in enumerate(brains):
        obj = brain.getObject()
        try:
            portal_repository.save(obj=obj, comment="Imported Version")
        except FileTooLargeToVersionError:
            pass
        if not index % 1000:
            msg = f"Created versions for {index} of {total} items."
            logger.info(msg)
            transaction.get().note(msg)
            transaction.commit()
    msg = "Created initial versions"
    transaction.get().note(msg)
    transaction.commit()

Dealing with validation errors

Sometimes you get validation-errors during import because the data cannot be validated. That can happen when options in a field are generated from content in the site. In these cases you cannot be sure that all options already exist in the portal while importing the content.

It may also happen, when you have validators that rely on content or configuration that does not exist on import.

Note

For relation fields this is not necessary since relations are imported after content anyway!

There are two ways to handle these issues:

  • Use a simple setter bypassing the validation used by the restapi
  • Defer the import until all other imports were run

Use a simple setter

You need to specify which content-types and fields you want to handle that way.

It is put in a key, that the normal import will ignore and set using setattr() before deserializing the rest of the data.

SIMPLE_SETTER_FIELDS = {
    "ALL": ["some_shared_field"],
    "CollaborationFolder": ["allowedPartnerDocTypes"],
    "DocType": ["automaticTransferTargets"],
    "DPDocument": ["scenarios"],
    "DPEvent" : ["Status"],
}

class CustomImportContent(ImportContent):

    def global_dict_hook(self, item):
        simple = {}
        for fieldname in SIMPLE_SETTER_FIELDS.get("ALL", []):
            if fieldname in item:
                value = item.pop(fieldname)
                if value:
                    simple[fieldname] = value
        for fieldname in SIMPLE_SETTER_FIELDS.get(item["@type"], []):
            if fieldname in item:
                value = item.pop(fieldname)
                if value:
                    simple[fieldname] = value
        if simple:
            item["exportimport.simplesetter"] = simple

    def global_obj_hook_before_deserializing(self, obj, item):
        """Hook to modify the created obj before deserializing the data.
        """
        # import simplesetter data before the rest
        for fieldname, value in item.get("exportimport.simplesetter", {}).items():
            setattr(obj, fieldname, value)

Note

Using global_obj_hook_before_deserializing makes sure that data is there when the event-handlers are run after import.

Defer import

You can also wait until all content is imported before setting the values on these fields. Again you need to find out which fields for which types you want to handle that way.

Here the data is stored in an annotation on the imported object from which it is later read. This example also supports setting some data with setattr without validating it:

from plone.restapi.interfaces import IDeserializeFromJson
from zope.annotation.interfaces import IAnnotations
from zope.component import getMultiAdapter

DEFERRED_KEY = "exportimport.deferred"
DEFERRED_FIELD_MAPPING = {
    "talk": ["somefield"],
    "speaker": [
        "custom_field",
        "another_field",
    ]
}
SIMPLE_SETTER_FIELDS = {"custom_type": ["another_field"]}

class CustomImportContent(ImportContent):

    def global_dict_hook(self, item):
        # Move deferred values to a different key to not deserialize.
        # This could also be done during export.
        item[DEFERRED_KEY] = {}
        for fieldname in DEFERRED_FIELD_MAPPING.get(item["@type"], []):
            if item.get(fieldname):
                item[DEFERRED_KEY][fieldname] = item.pop(fieldname)
        return item

    def global_obj_hook(self, obj, item):
        # Store deferred data in an annotation.
        deferred = item.get(DEFERRED_KEY, {})
        if deferred:
            annotations = IAnnotations(obj)
            annotations[DEFERRED_KEY] = {}
            for key, value in deferred.items():
                annotations[DEFERRED_KEY][key] = value

You then need a new step in the migration to move the deferred values from the annotation to the field:

class ImportDeferred(BrowserView):

    def __call__(self):
        # This example reuses the form export_other.pt from collective.exportimport
        self.title = "Import deferred data"
        if not self.request.form.get("form.submitted", False):
            return self.index()
        portal = api.portal.get()
        self.results = []
        for brain in api.content.find(DEFERRED_FIELD_MAPPING.keys()):
            obj = brain.getObject()
            self.import_deferred(obj)
        api.portal.show_message(f"Imported deferred data for {len(self.results)} items!", self.request)

    def import_deferred(self, obj):
        annotations = IAnnotations(obj, {})
        deferred = annotations.get(DEFERRED_KEY, None)
        if not deferred:
            return
        # Shortcut for simple fields (e.g. storing strings, uuids etc.)
        for fieldname in SIMPLE_SETTER_FIELDS.get(obj.portal_type, []):
            value = deferred.pop(fieldname, None)
            if value:
                setattr(obj, fieldname, value)
        if not deferred:
            return
        # This approach validates the values and converts more complex data
        deserializer = getMultiAdapter((obj, self.request), IDeserializeFromJson)
        try:
            obj = deserializer(validate_all=False, data=deferred)
        except Exception as e:
            logger.info("Error while importing deferred data for %s", obj.absolute_url(), exc_info=True)
            logger.info("Data: %s", deferred)
        else:
            self.results.append(obj.absolute_url())
        # cleanup
        del annotations[DEFERRED_KEY]

This additional view obviously needs to be registered:

<browser:page
    name="import_deferred"
    for="zope.interface.Interface"
    class=".import_content.ImportDeferred"
    template="export_other.pt"
    permission="cmf.ManagePortal"
    />

Handle LinguaPlone content

Export:

def global_dict_hook(self, item, obj):
    # Find language of the nearest parent with a language
    # Usefull for LinguaPlone sites where some content is languageindependent
    parent = obj.__parent__
    for ancestor in parent.aq_chain:
        if IPloneSiteRoot.providedBy(ancestor):
            # keep language for root content
            nearest_ancestor_lang = item["language"]
            break
        if getattr(ancestor, "getLanguage", None) and ancestor.getLanguage():
            nearest_ancestor_lang = ancestor.getLanguage()
            item["parent"]["language"] = nearest_ancestor_lang
            break

    # This forces "wrong" languages to the nearest parents language
    if "language" in item and item["language"] != nearest_ancestor_lang:
        logger.info(u"Forcing %s (was %s) for %s %s ", nearest_ancestor_lang, item["language"], item["@type"], item["@id"])
        item["language"] = nearest_ancestor_lang

    # set missing language
    if not item.get("language"):
        item["language"] = nearest_ancestor_lang

    # add info on translations to help find the right container
    # usually this idone by export_translations
    # but when migrating from LP to pam you sometimes want to check the
    # tranlation info during import
    if getattr(obj.aq_base, "getTranslations", None) is not None:
        translations = obj.getTranslations()
        if translations:
            item["translation"] = {}
            for lang in translations:
                uuid = IUUID(translations[lang][0], None)
                if uuid == item["UID"]:
                    continue
                translation = translations[lang][0]
                if not lang:
                    lang = "no_language"
                item["translation"][lang] = translation.absolute_url()

Import:

def global_dict_hook(self, item):

    # Adapt this to your site
    languages = ["en", "fr", "de"]
    default_language = "en"
    portal_id = "Plone"

    # No language => lang of parent or default
    if item.get("language") not in languages:
        if item["parent"].get("language"):
            item["language"] = item["parent"]["language"]
        else:
            item["language"] = default_language

    lang = item["language"]

    if item["parent"].get("language") != item["language"]:
        logger.debug(f"Inconsistent lang: item is {lang}, parent is {item['parent'].get('language')} for {item['@id']}")

    # Move item to the correct language-root-folder
    # This is only relevant for items in the site-root.
    # Most items containers are usually looked up by the uuid of the old parent
    url = item["@id"]
    parent_url = item["parent"]["@id"]

    url = url.replace(f"/{portal_id}/", f"/{portal_id}/{lang}/", 1)
    parent_url = parent_url.replace(f"/{portal_id}", f"/{portal_id}/{lang}", 1)

    item["@id"] = url
    item["parent"]["@id"] = parent_url

    return item

Alternative ways to handle items without parent

Often it is better to export and log items for which no container could be found instead of re-creating the original structure.

def update(self):
    self.items_without_parent = []

def create_container(self, item):
    # Override create_container to never create parents
    self.items_without_parent.append(item)

def finish(self):
    # export content without parents
    if self.items_without_parent:
        data = json.dumps(self.items_without_parent, sort_keys=True, indent=4)
        number = len(self.items_without_parent)
        cfg = getConfiguration()
        filename = 'content_without_parent.json'
        filepath = os.path.join(cfg.clienthome, filename)
        with open(filepath, 'w') as f:
            f.write(data)
        msg = u"Saved {} items without parent to {}".format(number, filepath)
        logger.info(msg)
        api.portal.show_message(msg, self.request)

Export/Import Zope Users

By default only users and groups stores in Plone are exported/imported. You can export/import Zope user like this.

Export

from collective.exportimport.export_other import BaseExport
from plone import api

import six

class ExportZopeUsers(BaseExport):

    AUTO_ROLES = ["Authenticated"]

    def __call__(self, download_to_server=False):
        self.title = "Export Zope users"
        self.download_to_server = download_to_server
        portal = api.portal.get()
        app = portal.__parent__
        self.acl = app.acl_users
        self.pms = api.portal.get_tool("portal_membership")
        data = self.all_zope_users()
        self.download(data)

    def all_zope_users(self):
        results = []
        for user in self.acl.searchUsers():
            data = self._getUserData(user["userid"])
            data['title'] = user['title']
            results.append(data)
        return results

    def _getUserData(self, userId):
        member = self.pms.getMemberById(userId)
        roles = [
            role
            for role in member.getRoles()
            if role not in self.AUTO_ROLES
        ]
        # userid, password, roles
        props = {
            "username": userId,
            "password": json_compatible(self._getUserPassword(userId)),
            "roles": json_compatible(roles),
        }
        return props

    def _getUserPassword(self, userId):
        users = self.acl.users
        passwords = users._user_passwords
        password = passwords.get(userId, "")
        return password

Import:

class ImportZopeUsers(BrowserView):

    def __call__(self, jsonfile=None, return_json=False):
        if jsonfile:
            self.portal = api.portal.get()
            status = "success"
            try:
                if isinstance(jsonfile, str):
                    return_json = True
                    data = json.loads(jsonfile)
                elif isinstance(jsonfile, FileUpload):
                    data = json.loads(jsonfile.read())
                else:
                    raise ("Data is neither text nor upload.")
            except Exception as e:
                status = "error"
                logger.error(e)
                api.portal.show_message(
                    u"Failure while uploading: {}".format(e),
                    request=self.request,
                )
            else:
                members = self.import_members(data)
                msg = u"Imported {} members".format(members)
                api.portal.show_message(msg, self.request)
            if return_json:
                msg = {"state": status, "msg": msg}
                return json.dumps(msg)

        return self.index()

    def import_members(self, data):
        app = self.portal.__parent__
        acl = app.acl_users
        counter = 0
        for item in data:
            username = item["username"]
            password = item.pop("password")
            roles = item.pop("roles", [])
            if not username or not password or not roles:
                continue
            title = item.pop("title", None)
            acl.users.addUser(username, title, password)
            for role in roles:
                acl.roles.assignRoleToPrincipal(role, username)
            counter += 1
        return counter

Export/Import properties, registry-settings and installed add-ons

When you migrate multiple similar sites that are configured manually it can be useful to export and import configuration that was set by hand.

Export/Import installed settings and add-ons

This custom export exports and imports some selected settings and add-ons from a Plone 4.3 site.

Export:

from collective.exportimport.export_other import BaseExport
from logging import getLogger
from plone import api
from plone.restapi.serializer.converters import json_compatible

logger = getLogger(__name__)


class ExportSettings(BaseExport):
    """Export various settings for haiku sites
    """

    def __call__(self, download_to_server=False):
        self.title = "Export installed add-ons various settings"
        self.download_to_server = download_to_server
        if not self.request.form.get("form.submitted", False):
            return self.index()

        data = self.export_settings()
        self.download(data)

    def export_settings(self):
        results = {}
        addons = []
        qi = api.portal.get_tool("portal_quickinstaller")
        for product in qi.listInstalledProducts():
            if product["id"].startswith("myproject."):
                addons.append(product["id"])
        results["addons"] = addons

        portal = api.portal.get()
        registry = {}
        registry["plone.email_from_name"] = portal.getProperty('email_from_name', '')
        registry["plone.email_from_address"] = portal.getProperty('email_from_address', '')
        registry["plone.smtp_host"] = getattr(portal.MailHost, 'smtp_host', '')
        registry["plone.smtp_port"] = int(getattr(portal.MailHost, 'smtp_port', 25))
        registry["plone.smtp_userid"] = portal.MailHost.get('smtp_user_id')
        registry["plone.smtp_pass"] = portal.MailHost.get('smtp_pass')
        registry["plone.site_title"] = portal.title

        portal_properties = api.portal.get_tool("portal_properties")
        iprops = portal_properties.imaging_properties
        registry["plone.allowed_sizes"] = iprops.getProperty('allowed_sizes')
        registry["plone.quality"] = iprops.getProperty('quality')
        site_props = portal_properties.site_properties
        if site_props.hasProperty("webstats_js"):
            registry["plone.webstats_js"] = site_props.webstats_js
        results["registry"] = json_compatible(registry)
        return results

Import:

The import installs the add-ons and load the settings in the registry. Since Plone 5 portal_properties is no longer used.

from logging import getLogger
from plone import api
from plone.registry.interfaces import IRegistry
from Products.CMFPlone.utils import get_installer
from Products.Five import BrowserView
from zope.component import getUtility
from ZPublisher.HTTPRequest import FileUpload

import json

logger = getLogger(__name__)

class ImportSettings(BrowserView):
    """Import various settings"""

    def __call__(self, jsonfile=None, return_json=False):
        if jsonfile:
            self.portal = api.portal.get()
            status = "success"
            try:
                if isinstance(jsonfile, str):
                    return_json = True
                    data = json.loads(jsonfile)
                elif isinstance(jsonfile, FileUpload):
                    data = json.loads(jsonfile.read())
                else:
                    raise ("Data is neither text nor upload.")
            except Exception as e:
                status = "error"
                logger.error(e)
                api.portal.show_message(
                    "Failure while uploading: {}".format(e),
                    request=self.request,
                )
            else:
                self.import_settings(data)
                msg = "Imported addons and settings"
                api.portal.show_message(msg, self.request)
            if return_json:
                msg = {"state": status, "msg": msg}
                return json.dumps(msg)

        return self.index()

    def import_settings(self, data):
        installer = get_installer(self.context)
        for addon in data["addons"]:
            if not installer.is_product_installed(addon) and installer.is_product_installable(addon):
                installer.install_product(addon)
                logger.info(f"Installed addon {addon}")
        registry = getUtility(IRegistry)
        for key, value in data["registry"].items():
            registry[key] = value
            logger.info(f"Imported record {key}: {value}")

Export/Import registry settings

The pull-request #130 has views @@export_registry and @@import_registry. These views export and import registry records that do not use the default-setting specified in the schema for that registry record. The export alone could also be usefull to figure out which settings were modified for a site.

That code will probably not be merged but you can use it in your own projects.

Migrate PloneFormGen to Easyform

To be able to export PFG as easyform you should use the branch migration_features_1.x of collective.easyform in your old site. Easyform does not need to be installed, we only need the methods fields_model and actions_model.

Export:

def dict_hook_formfolder(self, item, obj):
    item["@type"] = "EasyForm"
    item["is_folderish"] = False

    from collective.easyform.migration.fields import fields_model
    from collective.easyform.migration.actions import actions_model

    # this does most of the heavy lifting...
    item["fields_model"] = fields_model(obj)
    item["actions_model"] = actions_model(obj)

    # handle thankspage
    pfg_thankspage = obj.get(obj.getThanksPage(), None)
    if pfg_thankspage:
        item["thankstitle"] = pfg_thankspage.title
        item["thanksdescription"] = pfg_thankspage.Description()
        item["showAll"] = pfg_thankspage.showAll
        item["showFields"] = pfg_thankspage.showFields
        item["includeEmpties"] = pfg_thankspage.includeEmpties
        item["thanksPrologue"] = json_compatible(pfg_thankspage.thanksPrologue.raw)
        item["thanksEpilogue"] = json_compatible(pfg_thankspage.thanksEpilogue.raw)

    # optional
    item["exportimport._inputStorage"] = self.export_saved_data(obj)

    # Drop some PFG fields no longer needed
    obsolete_fields = [
        "layout",
        "actionAdapter",
        "checkAuthenticator",
        "constrainTypesMode",
        "location",
        "thanksPage",
    ]
    for key in obsolete_fields:
        item.pop(key, None)

    # optional: disable tabs for imported forms
    item["form_tabbing"] = False

    # fix some custom validators
    replace_mapping = {
        "request.form['": "request.form['form.widgets.",
        "request.form.get('": "request.form.get('form.widgets.",
        "member and member.id or ''": "member and member.getProperty('id', '') or ''",
    }

    # fix overrides in actions and fields to use form.widgets.xyz instead of xyz
    for schema in ["actions_model", "fields_model"]:
        for old, new in replace_mapping.items():
            if old in item[schema]:
                item[schema] = item[schema].replace(old, new)

        # add your own fields if you have these issues...
        for fieldname in [
            "email",
            "replyto",
        ]:
            if "request/form/{}".format(fieldname) in item[schema]:
                item[schema] = item[schema].replace("request/form/{}".format(fieldname), "python: request.form.get('form.widgets.{}')".format(fieldname))

    return item

def export_saved_data(self, obj):
    actions = {}
    for data_adapter in obj.objectValues("FormSaveDataAdapter"):
        data_adapter_name = data_adapter.getId()
        actions[data_adapter_name] = {}
        cols = data_adapter.getColumnNames()
        column_count_mismatch = False
        for idx, row in enumerate(data_adapter.getSavedFormInput()):
            if len(row) != len(cols):
                column_count_mismatch = True
                logger.debug("Column count mismatch at row %s", idx)
                continue
            data = {}
            for key, value in zip(cols, row):
                data[key] = json_compatible(value)
            id_ = int(time() * 1000)
            while id_ in actions[data_adapter_name]:  # avoid collisions during export
                id_ += 1
            data["id"] = id_
            actions[data_adapter_name][id_] = data
        if column_count_mismatch:
            logger.info(
                "Number of columns does not match for all rows. Some data were skipped in "
                "data adapter %s/%s",
                "/".join(obj.getPhysicalPath()),
                data_adapter_name,
            )
    return actions

Import exported PloneFormGen data into Easyform:

def obj_hook_easyform(self, obj, item):
    if not item.get("exportimport._inputStorage"):
        return
    from collective.easyform.actions import SavedDataBTree
    from persistent.mapping import PersistentMapping
    if not hasattr(obj, '_inputStorage'):
        obj._inputStorage = PersistentMapping()
    for name, data in item["exportimport._inputStorage"].items():
        obj._inputStorage[name] = SavedDataBTree()
        for key, row in data.items():
             obj._inputStorage[name][int(key)] = row

Export and import collective.cover content

Export:

from collective.exportimport.serializer import get_dx_blob_path
from plone.app.textfield.value import RichTextValue
from plone.namedfile.file import NamedBlobImage
from plone.restapi.interfaces import IJsonCompatible
from z3c.relationfield import RelationValue
from zope.annotation.interfaces import IAnnotations

def global_dict_hook(self, item, obj):
    item = self.handle_cover(item, obj)
    return item

def handle_cover(self, item, obj):
    if ICover.providedBy(obj):
        item['tiles'] = {}
        annotations = IAnnotations(obj)
        for tile in obj.get_tiles():
            annotation_key = 'plone.tiles.data.{}'.format(tile['id'])
            annotation = annotations.get(annotation_key, None)
            if annotation is None:
                continue
            tile_data = self.serialize_tile(annotation)
            tile_data['type'] = tile['type']
            item['tiles'][tile['id']] = tile_data
    return item

def serialize_tile(self, annotation):
    data = {}
    for key, value in annotation.items():
        if isinstance(value, RichTextValue):
            value = value.raw
        elif isinstance(value, RelationValue):
            value = value.to_object.UID()
        elif isinstance(value, NamedBlobImage):
            blobfilepath = get_dx_blob_path(value)
            if not blobfilepath:
                continue
            value = {
                "filename": value.filename,
                "content-type": value.contentType,
                "size": value.getSize(),
                "blob_path": blobfilepath,
            }
        data[key] = IJsonCompatible(value, None)
    return data

Import:

from collections import defaultdict
from collective.exportimport.import_content import get_absolute_blob_path
from plone.app.textfield.interfaces import IRichText
from plone.app.textfield.interfaces import IRichTextValue
from plone.namedfile.file import NamedBlobImage
from plone.namedfile.interfaces import INamedBlobImageField
from plone.tiles.interfaces import ITileType
from zope.annotation.interfaces import IAnnotations
from zope.component import getUtilitiesFor
from zope.schema import getFieldsInOrder

COVER_CONTENT = [
    "collective.cover.content",
]

def global_obj_hook(self, obj, item):
    if item["@type"] in COVER_CONTENT and "tiles" in item:
        item = self.import_tiles(obj, item)

def import_tiles(self, obj, item):
    RICHTEXT_TILES = defaultdict(list)
    IMAGE_TILES = defaultdict(list)
    for tile_name, tile_type in getUtilitiesFor(ITileType):
        for fieldname, field in getFieldsInOrder(tile_type.schema):
            if IRichText.providedBy(field):
                RICHTEXT_TILES[tile_name].append(fieldname)
            if INamedBlobImageField.providedBy(field):
                IMAGE_TILES[tile_name].append(fieldname)

    annotations = IAnnotations(obj)
    prefix = "plone.tiles.data."
    for uid, tile in item["tiles"].items():
        # TODO: Maybe create all tiles that do not need to be defferred?
        key = prefix + uid
        tile_name = tile.pop("type", None)
        # first set raw data
        annotations[key] = item["tiles"][uid]
        for fieldname in RICHTEXT_TILES.get(tile_name, []):
            raw = annotations[key][fieldname]
            if raw is not None and not IRichTextValue.providedBy(raw):
                annotations[key][fieldname] = RichTextValue(raw, "text/html", "text/x-html-safe")
        for fieldname in IMAGE_TILES.get(tile_name, []):
            data = annotations[key][fieldname]
            if data is not None:
                blob_path = data.get("blob_path")
                if not blob_path:
                    continue

                abs_blob_path = get_absolute_blob_path(obj, blob_path)
                if not abs_blob_path:
                    logger.info("Blob path %s for tile %s of %s %s does not exist!", blob_path, tile, obj.portal_type, obj.absolute_url())
                    continue
                # Determine the class to use: file or image.
                filename = data["filename"]
                content_type = data["content-type"]

                # Write the field.
                with open(abs_blob_path, "rb") as myfile:
                    blobdata = myfile.read()
                image = NamedBlobImage(
                    data=blobdata,
                    contentType=content_type,
                    filename=filename,
                )
                annotations[key][fieldname] = image
    return item

Fixing invalid collection queries

Some queries changes between Plone 4 and 5. This fixes the issues.

The actual migration of topics to collections in collective.exportimport.serializer.SerializeTopicToJson does not (yet) take care of that.

class CustomImportContent(ImportContent):

    def global_dict_hook(self, item):
        if item["@type"] in ["Collection", "Topic"]:
            item = self.fix_query(item)

    def fix_query(self, item):
        item["@type"] = "Collection"
        query = item.pop("query", [])
        if not query:
            logger.info("Drop item without query: %s", item["@id"])
            return

        fixed_query = []
        indexes_to_fix = [
            "portal_type",
            "review_state",
            "Creator",
            "Subject",
        ]
        operator_mapping = {
            # old -> new
            "plone.app.querystring.operation.selection.is":
                "plone.app.querystring.operation.selection.any",
            "plone.app.querystring.operation.string.is":
                "plone.app.querystring.operation.selection.any",
        }

        for crit in query:
            if crit["i"] == "portal_type" and len(crit["v"]) > 30:
                # Criterion is all types
                continue

            if crit["o"].endswith("relativePath") and crit["v"] == "..":
                # relativePath no longer accepts ..
                crit["v"] = "..::1"

            if crit["i"] in indexes_to_fix:
                for old_operator, new_operator in operator_mapping.items():
                    if crit["o"] == old_operator:
                        crit["o"] = new_operator

            if crit["i"] == "portal_type":
                # Some types may have changed their names
                fixed_types = []
                for portal_type in crit["v"]:
                    fixed_type = PORTAL_TYPE_MAPPING.get(portal_type, portal_type)
                    fixed_types.append(fixed_type)
                crit["v"] = list(set(fixed_types))

            if crit["i"] == "review_state":
                # Review states may have changed their names
                fixed_states = []
                for review_state in crit["v"]:
                    fixed_state = REVIEW_STATE_MAPPING.get(review_state, review_state)
                    fixed_states.append(fixed_state)
                crit["v"] = list(set(fixed_states))

            if crit["o"] == "plone.app.querystring.operation.string.currentUser":
                crit["v"] = ""

            fixed_query.append(crit)
        item["query"] = fixed_query

        if not item["query"]:
            logger.info("Drop collection without query: %s", item["@id"])
            return
        return item

Migrate to Volto

You can reuse the migration-code provided by @@migrate_to_volto in plone.volto in a migration. The following example (used for migrating https://plone.org to Volto) can be used to migrate a site from any older version to Plone 6 with Volto.

You need to have the Blocks Conversion Tool (https://github.com/plone/blocks-conversion-tool) running that takes care of migrating richtext-values to Volto-blocks.

See https://6.docs.plone.org/backend/upgrading/version-specific-migration/migrate-to-volto.html for more details on the changes the migration to Volto does.

from App.config import getConfiguration
from bs4 import BeautifulSoup
from collective.exportimport.fix_html import fix_html_in_content_fields
from collective.exportimport.fix_html import fix_html_in_portlets
from contentimport.interfaces import IContentimportLayer
from logging import getLogger
from pathlib import Path
from plone import api
from plone.volto.browser.migrate_to_volto import migrate_richtext_to_blocks
from plone.volto.setuphandlers import add_behavior
from plone.volto.setuphandlers import remove_behavior
from Products.CMFPlone.utils import get_installer
from Products.Five import BrowserView
from zope.interface import alsoProvides

import requests
import transaction

logger = getLogger(__name__)

DEFAULT_ADDONS = []


class ImportAll(BrowserView):

    def __call__(self):

        request = self.request

        # Check if Blocks-conversion-tool is running
        headers = {
            "Accept": "application/json",
            "Content-Type": "application/json",
        }
        r = requests.post(
            "http://localhost:5000/html", headers=headers, json={"html": "<p>text</p>"}
        )
        r.raise_for_status()

        # Submit a simple form template to trigger the import
        if not request.form.get("form.submitted", False):
            return self.index()

        portal = api.portal.get()
        alsoProvides(request, IContentimportLayer)

        installer = get_installer(portal)
        if not installer.is_product_installed("contentimport"):
            installer.install_product("contentimport")

        # install required add-ons
        for addon in DEFAULT_ADDONS:
            if not installer.is_product_installed(addon):
                installer.install_product(addon)

        # Fake the target being a classic site even though plone.volto is installed...
        # 1. Allow Folders and Collections (they are disabled in Volto by default)
        portal_types = api.portal.get_tool("portal_types")
        portal_types["Collection"].global_allow = True
        portal_types["Folder"].global_allow = True
        # 2. Enable richtext behavior (otherwise no text will be imported)
        for type_ in ["Document", "News Item", "Event"]:
            add_behavior(type_, "plone.richtext")

        transaction.commit()
        cfg = getConfiguration()
        directory = Path(cfg.clienthome) / "import"

        # Import content
        view = api.content.get_view("import_content", portal, request)
        request.form["form.submitted"] = True
        request.form["commit"] = 500
        view(server_file="Plone.json", return_json=True)
        transaction.commit()

        # Run all other imports
        other_imports = [
            "relations",
            "members",
            "translations",
            "localroles",
            "ordering",
            "defaultpages",
            "discussion",
            "portlets",  # not really useful in Volto
            "redirects",
        ]
        for name in other_imports:
            view = api.content.get_view(f"import_{name}", portal, request)
            path = Path(directory) / f"export_{name}.json"
            if path.exists():
                results = view(jsonfile=path.read_text(), return_json=True)
                logger.info(results)
                transaction.get().note(f"Finished import_{name}")
                transaction.commit()
            else:
                logger.info(f"Missing file: {path}")

        # Optional: Run html-fixers on richtext
        fixers = [anchor_fixer]
        results = fix_html_in_content_fields(fixers=fixers)
        msg = "Fixed html for {} content items".format(results)
        logger.info(msg)
        transaction.get().note(msg)
        transaction.commit()

        results = fix_html_in_portlets()
        msg = "Fixed html for {} portlets".format(results)
        logger.info(msg)
        transaction.get().note(msg)
        transaction.commit()

        view = api.content.get_view("updateLinkIntegrityInformation", portal, request)
        results = view.update()
        msg = f"Updated linkintegrity for {results} items"
        logger.info(msg)
        transaction.get().note(msg)
        transaction.commit()

        # Rebuilding the catalog is necessary to prevent issues later on
        catalog = api.portal.get_tool("portal_catalog")
        logger.info("Rebuilding catalog...")
        catalog.clearFindAndRebuild()
        msg = "Finished rebuilding catalog!"
        logger.info(msg)
        transaction.get().note(msg)
        transaction.commit()

        # This uses the blocks-conversion-tool to migrate to blocks
        logger.info("Start migrating richtext to blocks...")
        migrate_richtext_to_blocks()
        msg = "Finished migrating richtext to blocks"
        transaction.get().note(msg)
        transaction.commit()

        # Reuse the migration-form from plon.volto to do some more tasks
        view = api.content.get_view("migrate_to_volto", portal, request)
        # Yes, wen want to migrate default pages
        view.migrate_default_pages = True
        view.slate = True
        logger.info("Start migrating Folders to Documents...")
        view.do_migrate_folders()
        msg = "Finished migrating Folders to Documents!"
        transaction.get().note(msg)
        transaction.commit()

        logger.info("Start migrating Collections to Documents...")
        view.migrate_collections()
        msg = "Finished migrating Collections to Documents!"
        transaction.get().note(msg)
        transaction.commit()

        reset_dates = api.content.get_view("reset_dates", portal, request)
        reset_dates()
        transaction.commit()

        # Disallow folders and collections again
        portal_types["Collection"].global_allow = False
        portal_types["Folder"].global_allow = False

        # Disable richtext behavior again
        for type_ in ["Document", "News Item", "Event"]:
            remove_behavior(type_, "plone.richtext")

        return request.response.redirect(portal.absolute_url())


def anchor_fixer(text, obj=None):
    """Remove anchors since they are not supported by Volto yet"""
    soup = BeautifulSoup(text, "html.parser")
    for link in soup.find_all("a"):
        if not link.get("href") and not link.text:
            # drop empty links (e.g. anchors)
            link.decompose()
        elif not link.get("href") and link.text:
            # drop links without a href but keep the text
            link.unwrap()
    return soup.decode()

Migrate very old Plone Versions with data created by collective.jsonify

Versions older than Plone 4 do not support plone.restapi which is required to serialize the content used by collective.exportimport.

To migrate Plone 1, 2 and 3 to Plone 6 you can use collective.jsonify for the export and collective.exportimport for the import.

Export with collective.jsonify

Use https://github.com/collective/collective.jsonify to export content.

You include the methods of collective.jsonify using External Methods. See https://github.com/collective/collective.jsonify/blob/master/docs/install.rst for more info.

To work better with collective.exportimport you could extend the exported data using the feature additional_wrappers. Add info on the parent of an item to make it easier for collective.exportimport to import the data.

Here is a full example for json_methods.py which should be in BUILDOUT_ROOT/parts/instance/Extensions/

def extend_item(obj, item):
    """Extend to work better well with collective.exportimport"""
    from Acquisition import aq_parent
    parent = aq_parent(obj)
    item["parent"] = {
        "@id": parent.absolute_url(),
        "@type": getattr(parent, "portal_type", None),
    }
    if getattr(parent.aq_base, "UID", None) is not None:
        item["parent"]["UID"] = parent.UID()

    return item

Here is a full example for json_methods.py which should be in <BUILDOUT_ROOT>/parts/instance/Extensions/

from collective.jsonify.export import export_content as export_content_orig
from collective.jsonify.export import get_item

EXPORTED_TYPES = [
    "Folder",
    "Document",
    "News Item",
    "Event",
    "Link",
    "Topic",
    "File",
    "Image",
    "RichTopic",
]

EXTRA_SKIP_PATHS = [
    "/Plone/archiv/",
    "/Plone/do-not-import/",
]

# Path from which to continue the export.
# The export walks the whole site respecting the order.
# It will ignore everything untill this path is reached.
PREVIOUS = ""

def export_content(self):
    return export_content_orig(
        self,
        basedir="/var/lib/zope/json",
        skip_callback=skip_item,
        extra_skip_classname=[],
        extra_skip_id=[],
        extra_skip_paths=EXTRA_SKIP_PATHS,
        batch_start=0,
        batch_size=10000,
        batch_previous_path=PREVIOUS or None,
    )

def skip_item(item):
    """Return True if the item should be skipped"""
    portal_type = getattr(item, "portal_type", None)
    if portal_type not in EXPORTED_TYPES:
        return True

def extend_item(obj, item):
    """Extend to work better well with collective.exportimport"""
    from Acquisition import aq_parent
    parent = aq_parent(obj)
    item["parent"] = {
        "@id": parent.absolute_url(),
        "@type": getattr(parent, "portal_type", None),
    }
    if getattr(parent.aq_base, "UID", None) is not None:
        item["parent"]["UID"] = parent.UID()

    return item

To use these create three "External Method" in the ZMI root at the Zope root to use that:

  • id: "export_content", module name: "json_methods", function name: "export_content"
  • id: "get_item", module name: "json_methods", function name: "get_item"
  • id: "extend_item", module name: "json_methods", function name: "extend_item"

Then you can pass the extender to the export using a query-string: http://localhost:8080/Plone/export_content?additional_wrappers=extend_item

Import with collective.jsonify

Two issues need to be dealt with to allow collective.exportimport to import the data generated by collective.jsonify.

  1. The data is in directories instead of in one large json-file.
  2. The json is not in the expected format.

Starting with version 1.8 you can pass an iterator to the import.

You need to create a directory-walker that sorts the json-files the right way. By default it would import them in the order 1.json, 10.json, 100.json, 101.json and so on.

from pathlib import Path

def filesystem_walker(path=None):
    root = Path(path)
    assert(root.is_dir())
    folders = sorted([i for i in root.iterdir() if i.is_dir() and i.name.isdecimal()], key=lambda i: int(i.name))
    for folder in folders:
        json_files = sorted([i for i in folder.glob("*.json") if i.stem.isdecimal()], key=lambda i: int(i.stem))
        for json_file in json_files:
            logger.debug("Importing %s", json_file)
            item = json.loads(json_file.read_text())
            item["json_file"] = str(json_file)
            item = prepare_data(item)
            if item:
                yield item

The walker takes the path to be the root with one or more directories holding the json-files. The sorting of the files is done using the number in the filename.

The method prepare_data modifies the data before passing it to the import. A very similar task is done by collective.exportimport during export.

def prepare_data(item):
    """modify jsonify data to work with c.exportimport"""

    # Drop relationfields or defer the import
    item.pop("relatedItems", None)

    mapping = {
        # jsonify => exportimport
        "_uid": "UID",
        "_type": "@type",
        "_path": "@id",
        "_layout": "layout",
        # AT fieldnames => DX fieldnames
        "excludeFromNav": "exclude_from_nav",
        "allowDiscussion": "allow_discussion",
        "subject": "subjects",
        "expirationDate": "expires",
        "effectiveDate": "effective",
        "creation_date": "created",
        "modification_date": "modified",
        "startDate": "start",
        "endDate": "end",
        "openEnd": "open_end",
        "eventUrl": "event_url",
        "wholeDay": "whole_day",
        "contactEmail": "contact_email",
        "contactName": "contact_name",
        "contactPhone": "contact_phone",
        "imageCaption": "image_caption",
    }
    for old, new in mapping.items():
        item = migrate_field(item, old, new)

    if item.get("constrainTypesMode", None) == 1:
        item = migrate_field(item, "constrainTypesMode", "constrain_types_mode")
    else:
        item.pop("locallyAllowedTypes", None)
        item.pop("immediatelyAddableTypes", None)
        item.pop("constrainTypesMode", None)

    if "id" not in item:
        item["id"] = item["_id"]
    return item


def migrate_field(item, old, new):
    if item.get(old, _marker) is not _marker:
        item[new] = item.pop(old)
    return item

You can pass the generator filesystem_walker to the import:

class ImportAll(BrowserView):

    def __call__(self):
        # ...
        cfg = getConfiguration()
        directory = Path(cfg.clienthome) / "import"

        # import content
        view = api.content.get_view("import_content", portal, request)
        request.form["form.submitted"] = True
        request.form["commit"] = 1000
        view(iterator=filesystem_walker(directory / "mydata"))

        # import default-pages
        import_deferred = api.content.get_view("import_deferred", portal, request)
        import_deferred()


class ImportDeferred(BrowserView):

    def __call__(self):
        self.title = "Import Deferred Settings (default pages)"
        if not self.request.form.get("form.submitted", False):
            return self.index()

        for brain in api.content.find(portal_type="Folder"):
            obj = brain.getObject()
            annotations = IAnnotations(obj)
            if DEFERRED_KEY not in annotations:
                continue

            default = annotations[DEFERRED_KEY].pop("_defaultpage", None)
            if default and default in obj:
                logger.info("Setting %s as default page for %s", default, obj.absolute_url())
                obj.setDefaultPage(default)
            if not annotations[DEFERRED_KEY]:
                annotations.pop(DEFERRED_KEY)
        api.portal.show_message("Done", self.request)
        return self.index()

collective.jsonify puts the info on relations, translations and default-pages in the export-file. You can use the approach to defer imports to deal with that data after all items were imported. The example ImportDeferred above uses that approach to set the default pages.

This global_obj_hook below stores that data in a annotation:

def global_obj_hook(self, obj, item):
    # Store deferred data in an annotation.
    keys = ["_defaultpage"]
    data = {}
    for key in keys:
        if value := item.get(key, None):
            data[key] = value
    if data:
        annotations = IAnnotations(obj)
        annotations[DEFERRED_KEY] = data

Translations

This product has been translated into

  • Spanish

Contribute

Support

If you are having issues, please let us know.

License

The project is licensed under the GPLv2.

Written by

Starzel.de

collective.exportimport's People

Contributors

ale-rt avatar avoinea avatar ericof avatar erral avatar flipmcf avatar fredvd avatar fulv avatar gforcada avatar gotcha avatar jeffersonbledsoe avatar jugmac00 avatar macagua avatar mauritsvanrees avatar mikejmets avatar mpeeters avatar mrtango avatar pbauer avatar petschki avatar pgrunewald avatar rudd-o avatar sauzher avatar stevepiercy avatar sunew avatar thet avatar thibautborn avatar thomasmassmann avatar tkimnguyen avatar valipod avatar witsch avatar zopyx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

collective.exportimport's Issues

Output src/dest paths on relations.json to improve logging of failed imported relations

If the now optional with debug=1 addition of paths on the relations export is always done:

if self.debug:
item["from_path"] = source.absolute_url_path()
item["to_path"] = target.absolute_url_path()
results.append(item)

we can provide much better logging on the import step. Now it only logs

2021-05-20 14:30:29,877 INFO [collective.relationhelpers.api:237][waitress-2] 7b02d3b74f57cd1d4a1c782fd74dd649 is missing

Default page for site root not imported

All my default pages import, except for the default page of the plone site root. I notice in defaultpages.json that there is an object with UID plone_site_root, but this object ID does not appear in the content.json file (the actual plone site root has a concrete UID.

It does not look like the defaultpages routine that looks for the site root actually links the default page with the site root.

What gives?

Add export/import for default_page

When properly setting default-pages with setDefaultPage('foo') the target object foo has to exist and foo is also reindexed with the index is_default_page. So this is a task that needs to be handles after content migration same as relations.
A side-benefit is that you can skip this part when you want to migrate to a Plone 6 site with Volto (because Volto does not support default-pages).

Event with empty event_url field causes ValidationError in deserialization and "AttributeError: creation_date_migrated"

An Event exported from a Plone 4.3.6 site has the event_url field being an empty string if the field on the original object was not set.
As a result, when importing this object into a 5.2 site, this WARNING will be logged:

WARNING [collective.exportimport.import_content:310] cannot deserialize http://localhost:.../...: BadRequest([{'message': 'The specified URI is not valid.', 'field': 'event_url', 'error': 'ValidationError'}],)

This happens in ImportContent.import_new_content():

        # import using plone.restapi deserializers
        deserializer = getMultiAdapter((new, self.request), IDeserializeFromJson)
        try:
            new = deserializer(validate_all=False, data=item)
        except Exception as error:
            logger.warning(
                "cannot deserialize {}: {}".format(item["@id"], repr(error))
            )
            continue

One of the bad side-effects of this is that the folowing line is never reached for this object:

https://github.com/collective/collective.exportimport/blob/main/src/collective/exportimport/import_content.py#L356:

            new.creation_date_migrated = creation_date

Therefore, when later running @@reset_dates, this Event will produce this error:

AttributeError: creation_date_migrated

This is because acquisiton causes getattr(obj, "creation_date_migrated", None) to be true-ish, but del obj.creation_date raises AttributeError:

        created = getattr(obj, "creation_date_migrated", None)
        if created and created != obj.creation_date:
            obj.creation_date = created
            del obj.creation_date_migrated
            obj.reindexObject(idxs=["created"])

All of this can be avoided by setting item['event_url'] = None, which is what I did in a custom global_dict_hook():

def global_dict_hook(self, item):
    url = item.get('event_url', None)
    if url == '':
        item['event_url'] = None
    return item

Utlimately, I'm not sure if this should be addressed in plone.restapi or here, but it's one of various issues (like #12) that affect default content types and should not require custom hooks.

self.debug is not defined during relations export

Curious, because the function itself takes a debug parameter. (Plone 5.2 instance)

Traceback:

INFO:interpreter:Exporting relations from site
Traceback (most recent call last):
  File "/home/user/optplone/deployments/master/parts/client1/bin/interpreter", line 293, in <module>
    exec(_val)
  File "<string>", line 1, in <module>
  File "/home/user/optplone/deployments/master/src/ruddocom.policy/ruddocom/policy/export_embedded.py", line 101, in <module>
    full_export(site, exportpath, outputpath, what=what)
  File "/home/user/optplone/deployments/master/src/ruddocom.policy/ruddocom/policy/export_embedded.py", line 74, in full_export
    export_view()
  File "/home/user/optplone/deployments/master/src/collective.exportimport/src/collective/exportimport/export_other.py", line 71, in __call__
    all_stored_relations = self.get_all_references(debug)
  File "/home/user/optplone/deployments/master/src/collective.exportimport/src/collective/exportimport/export_other.py", line 136, in get_all_references
    if self.debug:
AttributeError: 'SimpleViewClass from /home/user/optplone/deploymen' object has no attribute 'debug'

git blame

2a0ff06f src/collective/exportimport/export_other.py         (Philip Bauer      2021-06-03 14:55:24 +0200 135)                             }
2a0ff06f src/collective/exportimport/export_other.py         (Philip Bauer      2021-06-03 14:55:24 +0200 136)                             if self.debug:
2a0ff06f src/collective/exportimport/export_other.py         (Philip Bauer      2021-06-03 14:55:24 +0200 137)                                 item["from_path"] = from_brain[0].getPath()
2a0ff06f src/collective/exportimport/export_other.py         (Philip Bauer      2021-06-03 14:55:24 +0200 138)                                 item["to_path"] = to_brain[0].getPath()
52acc20d src/collective/exportimport/export_other.py         (Thibaut Born      2021-11-29 13:40:14 +0100 139)                             item = self.reference_hook(item)
280c5cf9 src/collective/exportimport/export_other.py         (Thibaut Born      2021-11-30 11:20:42 +0100 140)                             if item is None:
280c5cf9 src/collective/exportimport/export_other.py         (Thibaut Born      2021-11-30 11:20:42 +0100 141)                                 continue
2a0ff06f src/collective/exportimport/export_other.py         (Philip Bauer      2021-06-03 14:55:24 +0200 142)                             results.append(item)

Installation instruction question

Hi from Plone Conf!

In the Installation header, the README says You don't need to install the add-on. after instructions for installing through buildout. There does not seem to be any information about using collective.exportimport without installation. How would collective.exportimport be used without installation? I am personally interested in exporting a complete Plone 4 site to compare results with jsonify.

commit by @pbauer

Export creator information

Currently, content is exported without creator information. There are cases, however, when that information is needed and it is not sufficient to have imported objects created by the importing user. Maybe it would be useful to export creator information optionally, but if so, I'm not sure about whether it should be done by default.

html_fixer breaks links to browser-views

Relative links to browser-views are replaced to link to the current objects parent:

(Pdb++) from collective.exportimport.fix_html import html_fixer
(Pdb++) text = """<p><a href="edit">Link to a browser view</a></p>"""
(Pdb++) html_fixer(text, self.context)
'<p><a data-linktype="internal" data-val="7eb11200ba09ec174b74f24d2fb6f0c1" href="resolveuid/7eb11200ba09ec174b74f24d2fb6f0c1">Link to @@edit view</a></p>'
(Pdb++) self.context.__parent__.UID()
'7eb11200ba09ec174b74f24d2fb6f0c1'

Relations import should use plone.api.relation when available

Currently collective.relationhelpers is used.
When I add this package in a Plone 6 site, startup fails with a configuration conflict:

zope.configuration.config.ConfigurationConflictError: Conflicting configuration actions
  For: ('view', (<InterfaceClass Products.CMFPlone.interfaces.siteroot.IPloneSiteRoot>, <InterfaceClass zope.publisher.interfaces.browser.IDefaultBrowserLayer>), 'inspect-relations', <InterfaceClass zope.publisher.interfaces.browser.IBrowserRequest>)
    File "/Users/maurits/shared-eggs/cp39/Products.CMFPlone-6.0.0a1.dev0-py3.9.egg/Products/CMFPlone/controlpanel/browser/configure.zcml", line 326.2-332.8
        <browser:page
            name="inspect-relations"
            for="Products.CMFPlone.interfaces.IPloneSiteRoot"
            class=".relations.RelationsInspectControlpanel"
            template="relations_inspect.pt"
            permission="Products.CMFPlone.InspectRelations"
            />
    File "/Users/maurits/shared-eggs/cp39/collective.relationhelpers-1.5-py3.9.egg/collective/relationhelpers/configure.zcml", line 9.2-15.8
        <browser:page
            name="inspect-relations"
            for="Products.CMFPlone.interfaces.IPloneSiteRoot"
            class=".api.InspectRelationsControlpanel"
            template="relations_inspect.pt"
            permission="cmf.ManagePortal"
            />
  For: ('view', (<InterfaceClass Products.CMFPlone.interfaces.siteroot.IPloneSiteRoot>, <InterfaceClass zope.publisher.interfaces.browser.IDefaultBrowserLayer>), 'rebuild-relations', <InterfaceClass zope.publisher.interfaces.browser.IBrowserRequest>)
    File "/Users/maurits/shared-eggs/cp39/Products.CMFPlone-6.0.0a1.dev0-py3.9.egg/Products/CMFPlone/controlpanel/browser/configure.zcml", line 334.2-340.8
        <browser:page
            name="rebuild-relations"
            for="Products.CMFPlone.interfaces.IPloneSiteRoot"
            class=".relations.RelationsRebuildControlpanel"
            template="relations_rebuild.pt"
            permission="cmf.ManagePortal"
            />
    File "/Users/maurits/shared-eggs/cp39/collective.relationhelpers-1.5-py3.9.egg/collective/relationhelpers/configure.zcml", line 17.2-23.8
        <browser:page
            name="rebuild-relations"
            for="Products.CMFPlone.interfaces.IPloneSiteRoot"
            class=".api.RebuildRelationsControlpanel"
            template="relations_rebuild.pt"
            permission="cmf.ManagePortal"
            />

Alternatively, collective.relationhelpers should be fixed to not fail in this case.

Support exporting a lot of content

Same as with the import (solved in #4), the export of a lot of data would eat up all the memory on the machine. The python dict that is created holds the data during export before being written to file as json can be quite large if you choose to to include base64-encoded binary data.

I have the use-case to export 60GB of content in files.

Options:

  • Export the blob-path and load each blob from the filesystem. That could be quite efficient.
  • Use https://pypi.org/project/jsonlines as format and write one object at a time to the filesystem. This would rquire changes in the import since jsoinlines is not readable by json or ijson.
  • Fake using jsonlines by writing one object at a time into a file but add a comma at the end of each line and wrap it in []. This would create valid json file the the import could read.

Imported portlet fails

With this data:

[
    {
        "portlets": {
            "plone.footerportlets": [
                {
                    "type": "plone.portlet.static.Static",
                    "visible": true,
                    "assignment": {
                        "header": "Want to comment on this?",
                        "text": {
                            "data": "<p style=\"text-align: center;\"><em>Want to comment on this?\u00a0 </em><a rel=\"noopener\" target=\"_blank\" href=\"https://t.me/Site_com\" data-linktype=\"external\" data-val=\"https://t.me/Site_com\"><strong>Join our Telegram</strong> channel</a> and share your opinions in each post!</p>",
                            "content-type": "text/html",
                            "encoding": "utf-8"
                        },
                        "omit_border": true,
                        "footer": null,
                        "more_url": null
                    }
                }
            ]
        },
        "uuid": "1bb9868bd9de4637977348acab5d1893"
    },
[...]

the imported portlet won't render, and editing shows this:

We’re sorry, but there seems to be an error…

Here is the full error message:

Traceback (innermost last):
  Module ZPublisher.WSGIPublisher, line 167, in transaction_pubevents
  Module ZPublisher.WSGIPublisher, line 376, in publish_module
  Module ZPublisher.WSGIPublisher, line 271, in publish
  Module ZPublisher.mapply, line 85, in mapply
  Module ZPublisher.WSGIPublisher, line 68, in call_object
  Module plone.app.portlets.browser.formhelper, line 177, in __call__
  Module z3c.form.form, line 233, in __call__
  Module plone.z3cform.fieldsets.extensible, line 65, in update
  Module plone.z3cform.patch, line 30, in GroupForm_update
  Module z3c.form.group, line 132, in update
  Module z3c.form.form, line 136, in updateWidgets
  Module z3c.form.field, line 277, in update
  Module plone.app.textfield.widget, line 42, in update
  Module z3c.form.browser.textarea, line 37, in update
  Module z3c.form.browser.widget, line 171, in update
  Module Products.CMFPlone.patches.z3c_form, line 46, in _wrapped
  Module z3c.form.widget, line 132, in update
  Module plone.app.textfield.widget, line 99, in toWidgetValue
ValueError: Can not convert {'data': '<p style="text-align: center;"><em>Want to comment on this?\xa0 </em><a rel="noopener" target="_blank" href="https://t.me/RuddO_com" data-linktype="external" data-val="https://t.me/RuddO_com"><strong>Join our Telegram</strong> channel</a> and share your opinions in each post!</p>', 'content-type': 'text/html', 'encoding': 'utf-8'} to an IRichTextValue

Looks like the text object is being passed in lieu of the text subobject. Something is wrong with the deserializer.

Export and import content revisions

I plan to implement the export and import of the full revision-history created bt CMFEditions.
I'm still undecided if it should be a option step during the default migration or a additional export/import step. Maybe the later to be able to limit the number of exported and imported revisions.

error on bin/buildout on plone 4.2.1

Hello, i'm tryng to install on plone 4.2.1 but i have this error
is this compatibile with 4.2.1? thanks


root@intranet2:/opt/buildout/Plone4.2.1Intranet/zeocluster#  ./bin/buildout
Updating zeoserver.
Installing client1.
Getting distribution for 'hurry.filesize'.
Got hurry.filesize 0.9.
Getting distribution for 'collective.exportimport'.
/opt/buildout/Plone4.2.1Intranet/Python-2.7/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'project_urls'
  warnings.warn(msg)
/opt/buildout/Plone4.2.1Intranet/Python-2.7/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'
  warnings.warn(msg)
error: Setup script exited with error in collective.exportimport setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers
An error occured when trying to install collective.exportimport main. Look above this message for any errors that were output by easy_install.
While:
  Installing client1.
  Getting distribution for 'collective.exportimport'.
Error: Couldn't install: collective.exportimport main
*************** PICKED VERSIONS ****************
[versions]
Products.LinguaPlone = 4.1.2
hurry.filesize = 0.9

*************** /PICKED VERSIONS ***************

topic sort criteria do not get added to the json on export

When testing the topic to collection migration we noticed to sort_on and sort_reversedmetadata seems to not get migrated.
it seems like it is just forgotten to add that to the metadata.

With a quick look it seems that it might be fixed by replacing

    self._collection_sort_reversed = criterion.getReversed()
    self._collection_sort_on = criterion.Field()

with

topic_metadata["sort_reversed"] = criterion.getReversed()
topic_metadata["sort_on"]  = criterion.Field()

in this file:
https://github.com/collective/collective.exportimport/blob/main/src/collective/exportimport/serializer.py#L341

or adding it below the
topic_metadata["query"] = json_compatible(formquery) on line 361

related: https://github.com/plone/plone.app.contenttypes/blob/1.1.1/plone/app/contenttypes/migration/topics.py#L505

Let me know if this seems correct to you, then we will make a PR

"childen"

The export content page has a typo, it should say children, it says childen.

Include more migrations in default content migration

Some export/imports that do not rely on other content (that might not yet exists at the time of importing) could be included in the default export/import of content. One example is constrains which is implemented like that in #71

Other options are:

  • local roles
  • discussions/comments
  • portlets - I think portlets are safe to include because they can't hold relations or binary data, right?

We could add hooks like item = self.export_constrain(item, obj) and self.import_constrain(obj, item) for each to make it easy to override. We could also add checkboxes so enable/disable these extra-steps during export.

In client projects I also export/import some marker-interfaces and annotations but I don't think that could be generalized.
It would probably be enough to add some more dokumentation with simple examples how that do that.

ATField (without base64 data) -> DXField: data missing

Somehow I do not get the trick. If I export my ATFileAttachment without base64 data it looks like this:

Bildschirmfoto 2021-05-27 um 09 25 15

Now importing this with a custom_dict_hook ("ATFileAttachment" -> "DXFile") creates a File object as expected, but the file field has not data 😢 ... so if I'm correct, the plone.restapi deserializer doesn't load data from remote origins? do I have to take care about this myself?

@pbauer @fredvd can anyone give me a hint on this?

Save last export date per portal_type

We have a Site with a lot of content admin traffic and there are new articles on the old page after we exported its content. So my idea is to safe the export date for each portal_type in a small portal annotation dict and add a checkbox on the @@export_content page to export types after the last_export_date... this date could/should also be shown in the type selector label ...

Ignoring some portlets during import

I have an portlet export that includes a portlet that I no longer want. Currently, the import fails, because the portlet type is not known:

Traceback (innermost last):
  Module ZPublisher.WSGIPublisher, line 167, in transaction_pubevents
  Module ZPublisher.WSGIPublisher, line 376, in publish_module
  Module ZPublisher.WSGIPublisher, line 271, in publish
  Module ZPublisher.mapply, line 85, in mapply
  Module ZPublisher.WSGIPublisher, line 68, in call_object
  Module collective.exportimport.import_other, line 582, in __call__
  Module collective.exportimport.import_other, line 597, in import_portlets
  Module collective.exportimport.import_other, line 618, in register_portlets
  Module zope.component._api, line 165, in getUtility
zope.interface.interfaces.ComponentLookupError:
(<InterfaceClass zope.component.interfaces.IFactory>, 'collective.quickupload.QuickUploadPortlet')

It would be nice if the importer could ignore it.
I see two options:

  1. Catch the ComponentLookupError from above, log a warning, and continue with the next portlet.
  2. Have a list of portlet types that must be ignored. Empty by default, but easily monkey patched.

I prefer the first one, as it is a small change and works for everyone. But maybe we prefer that integrators explicitly catch the error. For the second one I now have this monkey patch, from which I could make a PR.

from collective.exportimport import import_other

import logging


logger = logging.getLogger(__name__)
_orig_register_portlets = import_other.register_portlets
IGNORE_PORTLET_TYPES = [
    "collective.quickupload.QuickUploadPortlet",
]


def register_portlets(obj, item):
    """Register portlets for one object.

    CHANGE compared to original: pop unwanted portlets.

    I tried to override the browser view first,
    but then I would have had to copy the template.
    """
    for manager_name, portlets in item.get("portlets", {}).items():
        if not portlets:
            continue
        ignore = []
        for portlet_data in portlets:
            if portlet_data["type"] in IGNORE_PORTLET_TYPES:
                logger.info(
                    "Ignoring portlet type %s at %s",
                    portlet_data["type"],
                    obj.absolute_url(),
                )
                ignore.append(portlet_data)
        for portlet_data in ignore:
            portlets.remove(portlet_data)
    return _orig_register_portlets(obj, item)


import_other.register_portlets = register_portlets
logger.info(
    "Patched collective.exportimport register_portlets to ignore these types: %s",
    IGNORE_PORTLET_TYPES,
)

Do we have a preference?

Using JSONL as an alternative migration format

Exported JSON files can be become rather large in particular with inline binary data (which is often needed rather than having a reference to a blob file). Using JSONL would improve the handling of exports a lot. In particular, you could filter JSON records more easily using command line tools like grep.

Relations import, isReferencing is dropped in the target site after being filled by import_content?

Something funny is going on with the relations import.

  • I think I forgot to import relations in my target Plone 5.2 site. When I use @@inspect_relations from collective.relationhelpers I see around 1800 isReferencing relations in the target site.
  • So I exported relations again from the source Plone 4 site, import them in the target site.
  • Now my 1800 isReferencing relations are gone, but 800 relatesTo (related items) relations have been added.

In import_other isReferencing is added to the ignores list:

ignore = [
"translationOf", # old LinguaPlone
"isReferencing", # linkintegrity
"internal_references", # obsolete

From the rest of the code in this method it seems all existing relations in the target site are removed and only the 'sanitised' relations are imported again. but this destroys all linkintegrity relations.

I assume the import_content imports are recreating the isReferencing relations while restoring the content items.

The most obvious fix would be to not drop the relations

in:

for rel in data:
if rel["relationship"] in ignore:
continue
rel["from_attribute"] = self.get_from_attribute(rel)
all_fixed_relations.append(rel)
all_fixed_relations = sorted(
all_fixed_relations, key=itemgetter("from_uuid", "from_attribute")
)
relapi.purge_relations()
relapi.cleanup_intids()
relapi.restore_relations(all_relations=all_fixed_relations)

But this does not work, I get ObjectMissing Errors in z3c.relationfield.event.updateRelations where it tries to list existing relations:

[9] > /Users/fred/.buildout/eggs/cp38/z3c.relationfield-0.9.0-py3.8.egg/z3c/relationfield/event.py(81)updateRelations()
-> rels = list(catalog.findRelations({'from_id': obj_id}))
[10]   /Users/fred/.buildout/eggs/cp38/zc.relation-1.1.post2-py3.8.egg/zc/relation/catalog.py(734)<genexpr>()
-> return (self._relTools['load'](t, self, cache) for t in tokens)
[11]   /Users/fred/.buildout/eggs/cp38/z3c.relationfield-0.9.0-py3.8.egg/z3c/relationfield/index.py(49)load()
-> return intids.getObject(token)
[12]   /Users/fred/.buildout/eggs/cp38/zope.intid-4.3.0-py3.8.egg/zope/intid/__init__.py(89)getObject()
-> raise ObjectMissingError(id)

Stranger is that when I step back in the debugger to frame 9 and execute the same line again I do get the rels list:

(Pdb++) list(catalog.findRelations({'from_id': obj_id}))
[<z3c.relationfield.relation.RelationValue object at 0x11eefdac0 oid 0x200b0 in <Connection at 10d0bb760>>]

Or I could remove isReferencing from the ignores list

but then I'm importing the Plone 4 Archetypes linkintegrity lists on recreated Dexterity Content. It could work as no paths/ids etc have changed and for non existing content the relations will be dropped. But it doesn't feel very clean to do this.

[edit:] I tried this and the isReferencing is actually increasing when I include isReferencing relations on import. So importing content creates 1848 isReferencing relatins, then restoing relations ups isReferencing to 1941 relations. :-O

Export existing relations on the target site and merge them with the imported relations.

Meh. I mean doing this in the import_code: make an extra import on the fly, merge with the incoming relations.json and reapply, dropping isReferencing only from the relations.json .

@pbauer How did you deal with this so far while using collective.exportimport?

Component conversation_view not available when executing view export_content() from the Plone CLI

I hae a snippet of code that is supposed to export (and it works with all exports other than content), but when I try it with content export, I get a zero-bytes JSON file. This snippet of code runs as an entry point of my bin/client program.

Does anyone know what the problem with the export is?

Code: https://gist.github.com/Rudd-O/f46154c80eb9937ec387e2b460ebbe8b

EDIT:

Ultimately the goal is to be able to command an export of everything via the Plone CLI (bin/client -c export.py), primarily for (but not limited to) migration automation and testing. All other exports work correctly using this code — only the content one does not.

Collection queries with old query param

When exporting ATCollection items the queries might contain outdated configurations. The following query doesn’t work in Plone 5 with plone.app.querystring > 1.3.2 (https://github.com/plone/plone.app.querystring/blob/master/CHANGES.rst#1312-2015-11-26)

        "query": [
            {
                "i": "portal_type", 
                "o": "plone.app.querystring.operation.selection.is", 
                "v": [
                    "News Item"
                ]
            }, 
            {
                "i": "path", 
                "o": "plone.app.querystring.operation.string.relativePath", 
                "v": "../"
            }
        ],

The correct version would be:

        "query": [
            {
                "i": "portal_type", 
                "o": "plone.app.querystring.operation.selection.any", 
                "v": [
                    "News Item"
                ]
            }, 
            {
                "i": "path", 
                "o": "plone.app.querystring.operation.string.relativePath", 
                "v": "../"
            }
        ],

With plone.app.querystring there is an upgrade step available (8 -> 9) which you can run to fix imported collections with the wrong queries.

Besides that, should we add some logic to adjust the queries on export/import?

Export part of a production site?

Hi, sorry for not having a closer look first, but if it is possible, then all is good, if it is still not possible, but desirable, I might contribute the missing code if such a use case would be liked:

So my use case would be to be able to export (and re-import) a part of the website. The main idea is to export anything that is linked from a few overview pages and be able to re-import that on an empty plone site. This would allow for example our designer to work with real content but not have to work with the 1M objects we have on our database 🙂

Is that already possible? 🤔 would it be ok to provide such a functionality here in this package?

On import DX-deserializer fails if expires is set before effective

In old Plone sites this is apparently no problem to set, but when we call the DX deserializer on an imported item where effective > expires, the import is aborted with a ValidationError:

orig_error is my local patch in plone.restapi to see the real message when I get dropped in to a PDB (PDBDebugMode)


*** zExceptions.BadRequest: [{'error': 'ValidationError', 
'message': 'error_expiration_must_be_after_effective_date', 
'orig_error': EffectiveAfterExpires('error_expiration_must_be_after_effective_date'),
'id': 'http://localhost:9050/plone/nl/kalender/asfasdfdsfasdf.pdf', 

'orig_data': {'@id': 'http://localhost:9050/plone/nl/kalender/asdfasdfsdfads.pdf', '@type': 'File', 'UID': '5397b307c9cc784319fca831cb0994d9', 'allow_discussion': False, 'contributors': [], 'created': '2015-09-07T12:40:17+00:00', 'creators': ['asdfasfdf'], 'description': None, 'effective': '2015-09-07T12:40:00+00:00', 'exclude_from_nav': True, 'expires': '2014-02-01T00:00:00+00:00',

KeyError: 'parent' while exporting discussion comments

Plone 4.3.20 (AT) with current checkout from master generates these errors for discussion items:

2022-04-12T16:07:55 ERROR collective.exportimport.export_content Error exporting http://dev2.zopyx.de:5080/eteaching/community/communityevents/ringvorlesung/hybride-lehrszenarien-gestalten/++conversation++default/1602677734552386
Traceback (most recent call last):
  File "/home/ajung/sandboxes/iwm/plone4.buildout/src/collective.exportimport/src/collective/exportimport/export_content.py", line 284, in export_content
    item = self.fix_url(item, obj)
  File "/home/ajung/sandboxes/iwm/plone4.buildout/src/collective.exportimport/src/collective/exportimport/export_content.py", line 428, in fix_url
    if item["parent"]["@id"] != parent_url:
KeyError: 'parent'

Wrong @id for exported Collections

Exported collections have the wrong @id attribute set.

[
    {
        "@id": "http://localhost:8080/Plone/@@export_content", 
        "@type": "Collection", 
        "UID": "e6b3bf21738d4866b9acd1d0e7a1cf51", 
        "allow_discussion": false, 
        "..."
    }
]

I was able to bypass this with a custom export view:

class CustomExportContent(ExportContent):

    def dict_hook_collection(self, item, obj):
        """Use this to modify or skip the serialized data by type.
        Return the modified dict (item) or None if you want to skip this particular object.
        """
        # Fix the id for collections, which is set to “@@export-content” because of the HypermediaBatch in plone.restapi
        item["@id"] = obj.absolute_url()
        return item

For the import, only the @id is required, although there are more properties with the wrong url:

        "batching": {
            "@id": "http://localhost:8080/Plone/@@export_content", 
            "first": "http://localhost:8080/Plone/@@export_content?b_start=0", 
            "last": "http://localhost:8080/Plone/@@export_content?b_start=50", 
            "next": "http://localhost:8080/Plone/@@export_content?b_start=25"
        }, 

@pbauer, any opinion on how to fix this? Would it be ok to add the hook to the add-on directly?

plone.formwidget.geolocation.geolocation.Geolocation: No converter for making ....JSON compatible

Not a bug, perhaps a feature request for supporting plone.formwidget.geolocation.geolocation.Geolocation:

2021-12-11 18:05:41,976 INFO    [collective.exportimport.export_content:299][waitress-2] Error exporting https://test.dynamore.de/en/locations/subsidiaries/dynamore-swiss-en: No converter for making <plone.formwidget.geolocation.geolocation.Geolocation object at 0x7fed111cff28> (<class 'plone.formwidget.geolocation.geolocation.Geolocation'>) JSON compatible

Exporting portlet with relation field fails

I have several relation fields in a portlet. Exporting portlets then fails because a RelationValue is not json serialisable:

http://localhost:9152/plone/@@export_portlets
Traceback (innermost last):
  Module ZPublisher.Publish, line 138, in publish
  Module ZPublisher.mapply, line 77, in mapply
  Module ZPublisher.Publish, line 48, in call_object
  Module collective.exportimport.export_other, line 477, in __call__
  Module json, line 251, in dumps
  Module json.encoder, line 209, in encode
  Module json.encoder, line 431, in _iterencode
  Module json.encoder, line 332, in _iterencode_list
  Module json.encoder, line 408, in _iterencode_dict
  Module json.encoder, line 408, in _iterencode_dict
  Module json.encoder, line 332, in _iterencode_list
  Module json.encoder, line 408, in _iterencode_dict
  Module json.encoder, line 408, in _iterencode_dict
  Module json.encoder, line 442, in _iterencode
  Module json.encoder, line 184, in default
TypeError: <z3c.relationfield.relation.RelationValue object at 0x114132050> is not JSON serializable

A bit related Is this comment from Philip where he removes some relations code, although I guess this was only active when exporting content, and not portlets.

The following diff in the portlet export code fixes it for me:

$ git diff
diff --git a/src/collective/exportimport/export_other.py b/src/collective/exportimport/export_other.py
index f358a1c..383635a 100644
--- a/src/collective/exportimport/export_other.py
+++ b/src/collective/exportimport/export_other.py
@@ -535,13 +535,18 @@ def export_local_portlets(obj):
             settings = IPortletAssignmentSettings(assignment)
             if manager_name not in items:
                 items[manager_name] = []
+            from z3c.relationfield.relation import RelationValue
+            assignment_data = {}
+            for name in schema.names():
+                value = getattr(assignment, name, None)
+                if value and isinstance(value, RelationValue):
+                    value = value.to_object.UID()
+                assignment_data[name] = value
+
             items[manager_name].append({
                 'type': portlet_type,
                 'visible': settings.get('visible', True),
-                'assignment': {
-                    name: getattr(assignment, name, None)
-                    for name in schema.names()
-                },
+                'assignment': assignment_data,
             })
     return items

The code needs to be more robust, but those are details.
I am not sure if this is a reasonable place for this fix or if there is a more general place.

Ah, wait, using this works too:

json_compatible(getattr(assignment, name, None))

At least then you get an export without errors, although my earlier code that returns uuids could be preferable in some cases.

Package is not installable with pip

Trying to install collective.exportimport with pip does not work, raising the error
ERROR: Package 'collective.exportimport' requires a different Python: 3.8.1 not in '==2.7, >=3.6'

The issue is in the python_requires specification, and can be seen with:

pip install packaging

python
>>> from packaging.specifiers import SpecifierSet
>>> from packaging.version import Version
>>> Version("3.8.1") in SpecifierSet("==2.7, >=3.6")
False
>>> Version("2.7.10") in SpecifierSet("==2.7, >=3.6")
False

The PEP 440 specify "The comma (",") is equivalent to a logical and operator: a candidate version must match all given version clauses in order to match the specifier as a whole."

Importing richtext via restapi unescapes escaped html-entities

In plone.restapi deserializing richtext uses html_parser.unescape(data) before setting the RichTextValue. See https://github.com/plone/plone.restapi/blob/master/src/plone/restapi/deserializer/dxfields.py#L292

This leads to broken content because code-examples are transformed to html-tags: <pre>Code example: &lt;h2&gt;Heading 2&lt;/h2&gt; example</pre> becomes <pre>Code example: <h2>Heading 2</h2> example</pre>

I'm not sure if this is a bug in restapi but for the purpose of exportimport I will override the RichTextFieldDeserializer with a version that does not do that for now.

Exporting relations sometimes fails with certain content objects that cannot be adapted (on crufty ZODBs)

Here is a diff that allows the program to continue.

diff --git a/src/collective/exportimport/export_other.py b/src/collective/exportimport/export_other.py
index 2cf8e33..201f7e7 100644
--- a/src/collective/exportimport/export_other.py
+++ b/src/collective/exportimport/export_other.py
@@ -117,7 +117,12 @@ class ExportRelations(BrowserView):
             if relation_catalog:
                 portal_catalog = getToolByName(self.context, "portal_catalog")
                 for rel in relation_catalog.findRelations():
-                    if rel.from_path and rel.to_path:
+                    try:
+                        rel_from_path_and_rel_to_path = rel.from_path and rel.to_path
+                    except ValueError:
+                        logger.exception("Cannot export relation %s, skipping", rel)
+                        continue
+                    if rel_from_path_and_rel_to_path:
                         from_brain = portal_catalog(
                             path=dict(query=rel.from_path, depth=0)
                         )

Add modifiers to simplify handling migrations

Some changes need to be done to the serialized data when this tool is used for a migration (e.g. from Plone 4 to Plone 5 or 6) and/or from Archetypes to Dexterity.

To make this easier we could add a checkbox (checked by default) "Modify data for migrations".

If this is checked then some modifiers will run during export.

These could include:

Drop unused data

Some data that restapi includes is useless for migrations. E.g. @components, next_item, previous_item.

Drop all Relations

Relations are migrated seperately. haveing them in the data will mess up the site. This is probably easiest done by switching on custom serializers for IReferenceField (AT) and IRelationChoice and IRelationList (DX) that return none.

Change default-fieldnames (AT to DX):

    # Migrate AT to DX
    if item.get("expirationDate"):
        item["expires"] = item["expirationDate"]
    if item.get("effectiveDate"):
        item["effective"] = item["effectiveDate"]
    if item.get("excludeFromNav"):
        item["exclude_from_nav"] = item["excludeFromNav"]
    if item.get("subject"):
        item["subjects"] = item["subject"]

Fix datetime data

see #10

Fix issue with AT Text fields

TextField-export in Archetypes: Inspecting the AT-schema and applying a change for all Textfields if the RichtextWidget is not used (which means the field is probably Text in DX and not RichText).

    # In Archetypes Text is handled the same as RichText
    if isinstance(item.get("description", None), dict):
        item[fieldname] = item[fieldname]["data"]
    if isinstance(item.get('rights', None), dict):
        item['rights'] = item['rights']['data']

Fix collection-criteria

Some criteria have changed, e.g.

query = item.pop("query", [])
for crit in query:
    if crit["o"].endswith("relativePath") and crit["v"] == "..":
        crit["v"] = "..::1"
    if crit["i"] == "portal_type" and crit["o"].endswith("selection.is"):
        crit["o"] = "plone.app.querystring.operation.selection.any"

Fix image links and scales

Use the code in https://github.com/collective/collective.migrationhelpers/blob/master/src/collective/migrationhelpers/images.py to fix links to images and make them editable in TinyMCE.

[proposal] unified export format

problem

  • Currently have to use several different forms to do both export and then import
  • You also have to know the right order to do the import when you have multiple export json

proposed solution

User will go to the site setup and click on export and get a UI something like

-------------------------------------------------------------
| Warning: multiple exports selected. Download will be tar.gz |
-------------------------------------------------------------

# Exports
[x] Content
[x] File/Images
[  ] Users
[  ] Content Tree
[  ] Relations
[  ] Translations
[  ] Local Roles
[  ] Default Page Mapping
[  ] Object Positions in Parent
[  ] Comments


# Content Export

{query widget}
Type: Page
Path: /news depth:1
Path: /other-news depth:1
Creation date: > 1/1/20018

Selected content (21 items)
--------------------------------
| /news/item1
| /news/item2
| /other-news/big-news
--------------------------------

# File/Images Export

(o) url/path
(  ) binary in tar.gz 
(  ) base64 encoded in json

[Download] [Save to Server] [Dry Run] [Cancel]

Features this adds

  • tar.gz if multiple exports selected
    • still keeps json as multiple files
    • Would enable things to be put in the right orde
    • still memory efficient as tar can be streamed.
  • somehow take every exporter and merging into one UI.
    • Adapter to get schema?

Alternatives considered

  • still have seperate views?
  • use a single json format that contains the others
    • can still stream it and webserver might do gzip compression automatically anyway
    • can still choose to include blobs for small sites using base64 encoding.
    • con: might be harder for the user to inject or change blobs without writing code?

Feature request: exclude path

Just as content export has a "start from this path" input field, there should be a "do not recurse in the following paths" field (perhaps a multiline field / list) that makes the content exporter iterator completely skip anything below those paths. For many use cases, this is not necessary since the exported JSON can be filtered to exclude said content, but there are cases for very large sites where the user would rather not have to wait until everything is exported.

Concomitant to this, perhaps it would not be such a bad idea to allow for exports of multiple roots rather than a single one. Right now I have four Plone sites in a single Zope instance, and I have to export them all individually rather than at once. It's a bit of a bummer, frankly (and I also could never get the plone api get view thingie to work in scripts — when I do it that way, no objects are exported).

Export/import workflow_history

I plan to add support for export/import of workflow history. It will probably be straightforward and part of the default process.

Feature request: import / export content rules

Plone content rules are stored in an object IRuleStorage which is not visible in Zope space and is also not visible in content space (it can only be obtained using getUtility()). Implementation-wise, IRuleStorage is implemented as a simple OOBTree(), that can be acquired (plone.app.contentrules.browser.assignments.acquired_rules() has the scoop).

It would be great if the import/export framework had a step to deal with the rule storage. This would require at least a custom pair of serializer/deserializer.

AttributeError: modification_date_migrated while resetting dates

Traceback (most recent call last):
  File "/home/user/optplone/deployments/601a/parts/client1/bin/interpreter", line 294, in <module>
    exec(_val)
  File "<string>", line 1, in <module>
  File "/home/user/optplone/deployments/601a/src/ruddocom.policy/src/ruddocom/policy/ctl/import_embedded.py", line 105, in <module>
    full_import(site, importpath, what)
  File "/home/user/optplone/deployments/601a/src/ruddocom.policy/src/ruddocom/policy/ctl/import_embedded.py", line 85, in full_import
    fixer()
  File "/home/user/optplone/deployments/601a/src/collective.exportimport/src/collective/exportimport/import_content.py", line 697, in __call__
    portal.ZopeFindAndApply(portal, search_sub=True, apply_func=reset_dates)
  File "/home/user/optplone/buildout-cache/eggs/Zope-5.3-py3.8.egg/OFS/FindSupport.py", line 171, in ZopeFindAndApply
    self.ZopeFindAndApply(ob, obj_ids, obj_metatypes,
  File "/home/user/optplone/buildout-cache/eggs/Zope-5.3-py3.8.egg/OFS/FindSupport.py", line 171, in ZopeFindAndApply
    self.ZopeFindAndApply(ob, obj_ids, obj_metatypes,
  File "/home/user/optplone/buildout-cache/eggs/Zope-5.3-py3.8.egg/OFS/FindSupport.py", line 165, in ZopeFindAndApply
    apply_func(ob, (apply_path + '/' + p))
  File "/home/user/optplone/deployments/601a/src/collective.exportimport/src/collective/exportimport/import_content.py", line 688, in reset_dates
    del obj.modification_date_migrated
AttributeError: modification_date_migrated

Why is this happening?

more control over content exported (via query widget)

problem

  • currently can only select by type or single path (+ depth)

Solution

  • single query widget to replace path, depth and type
  • have a default query with all types and root folder selected to make it quicker to adjust
  • show count/preview of selected content. Maybe counts of all the content types selected?
  • some kind of adapter to make it easy to insert specific options if another exporter is added?
-------------------------------------------------------------
| Warning: multiple exports selected. Download will be tar.gz |
-------------------------------------------------------------

# Exports
[x] Content
[x] File/Images
[  ] Users
[  ] Content Tree
[  ] Relations
[  ] Translations
[  ] Local Roles
[  ] Default Page Mapping
[  ] Object Positions in Parent
[  ] Comments

# Content Export

{query widget}
Type: Page, News Item
Path: /news depth:1
Path: /other-news depth:1
Creation date: > 1/1/20018

-----------------------------
Selected content (21 items, 9Mb)
- News Item (10, 5Mb)
- Page (9, 2Mb)
- Folders (implicit) (2, 1Mb)
-----------------------------

# File/Images Export

(o) url/path
(  ) binary in tar.gz 
(  ) base64 encoded in json

[Download] [Save to Server] [Dry Run] [Cancel]

Alternatives considered

  • it would be more intuitive to put the options for each exporter indented directly under its checkbox.
    like "Conditionally revealing a related question in GDS"
  • Query widget should probably move to the top since its used for almost all exporters?
  • show list of content rather than counts by type? or maybe counts by path?

import_content: Dry run option with more verbose deserializer validation feedback

While test-using collective.exportimport I'm running into edge cases. I could solve these solve these by adding more code to a subclassed export_content or import_content view in my own custom migrationhelper package.

But the edge case is often 1 field on 1-3 content items in a 8 year old site. It's just not worth it to try to catch these in code, but easier to fix in the source site. See #12 as an example.

The main cause for validation error is the schema validation running at the deserializer step on import_content in plone.restapi. And plone.restapi is catching all validation errors and rethrowing them with a generic ValidationError class that only show the field, but not the error. (https://github.com/plone/plone.restapi/blob/f89276054088340b3ec6775db6280b1dc46f0866/src/plone/restapi/deserializer/dxcontent.py#L55-L60)

To support fixing these blips of migration issues, it would be nice to have a 'dry_run' option in the import_content that tries to create and deserialize a json, catches any validation errors and outputs a report of found original errors coming from https://github.com/plone/plone.restapi/blob/f89276054088340b3ec6775db6280b1dc46f0866/src/plone/restapi/deserializer/dxcontent.py#L45

For this to work we'd need a new feature in plone.restapi to not swallow the ValidationError's as shown above.

Small improvement: add z3c.relationfield, plone.app.contenttypes in install_requires/setup.py

Using a really minimal Plone 4 installation (https://github.com/collective/minimalplone4/) we get these errors when adding collective.exportimport to eggs:

collective.exportimport-1.0-py2.7.egg/collective/exportimport/configure.zcml", line 13.2-19.8
    ImportError: No module named relationfield.interfaces

collective.exportimport-1.0-py2.7.egg/collective/exportimport/configure.zcml", line 57.2-58.52
    ImportError: No module named contenttypes.interfaces

Funny that this wasn't caught on tests.

Needed other pins as well, but is out of the scope of this project (will put them here for the sake of documentation):

# Python 2/Plone 4 compatibility.                                                          
plone.restapi = 6.13.8
PyJWT = 1.7.1                                                                      
# https://github.com/Julian/jsonschema/issues/453                                  
# Getting distribution for 'pyrsistent>=0.14.0'                                    
# ValueError: need more than 0 values to unpack                                    
jsonschema = 2.6.0

On import ModifiedDate datestring its timezone is not recognised

Copying this issue from what I wrote on community.plone.org to keep the list of possible switched/fixes central on github.

While importing an exported contenttype from Plone 4/AT to Plone 5.2 DX, the import routine gives a traceback on:

modified_data = datetime.strptime(modified, "%Y-%m-%dT%H:%M:%S%z")

The problem is the timezone part of the date string the exported modified date is u'2011-10-13T13:49:57+01:00' with a : separator in the timezone. strptime on Python 3 expects 0100. My work around so far has been simple and is on https://github.com/collective/collective.exportimport/tree/modified_date_parser . Use dateutil to parse the string on import.

However:

  • dateutil its parser is advanced magic, you can't/don't even pass a format string.
  • Should we maybe fix this on export of Archetypes content instead of enhancing the import
  • Are there maybe other small deviations in the date export and is it depended on locale or just a difference between AT/DX serialisers?

If this is specific to AT, this fix could be added/combined with other small tweaks to make the AT export more compatible with DX import and put behind a checkbox/switch as discussed in #7.

Handle site root object default page

Currently, we do not export/import the default page of the site root. I propose to use an empty string for the "uuid" dict value to denote the default page of the site root, as a special case.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.