Flame alert: the Plone Python 3 upgrade story

frisi · March 29, 2021, 7:48am

thanks @pbauer
we already had planned to use and extend collective.exportimport for migrating our >40GB Data.fs and 130GB Blobstorage project in the next months.

most portlets are assigned content-type and group based (so there is no need to migrate them) but there are a few user defined though.
not sure if it will be worth the effort to treat them in the migration.

will share the findings here in the forum.

pbauer · March 29, 2021, 12:29pm

Great, I'm excited to hear how that worked.
I guess at least saving the files to the server would would need to be optimized since keeping 100GB of files in memory at the same time is not feasible. I guess you could also chose not to export/import blobdata and restore them afterwards. The oid (on which the blobpath is based) is not exported/imported though and I have never checked what would be required to keep the same blobpath for exported/imported content...

fredvd · April 19, 2021, 10:13am

I tested collective.exportimport on a project where we are planning/have almost done an inplace migration for the content from Plone 4 to Plone 5.X . This particular site has been upgraded from Plone 2.X up and up. With trying to get the inplace migration working small exceptions/leftsover component-registration, broken items, etc. surfaced, hence my spike to test this route.

With limited testing collective.exportimport shows great potential, I'm already restoring some content, but I have run into some issues of which I'm not sure if these are generic enough to add to the add'on or are edge cases to override in my own export/import class as suggested in the README.

First: the import/exports are all using pone.restapi's (de)serializers and I think there are some small differences between the AT and DX ones.
The description field on some content items is from plone.restapi in Plone 4 exported as:

"description": { "content-type": "text/plain", "data": "some text" },

where in Plone 5DX this is exported as

"description": "text"

But Plone 5.2 is restapi deserializer/importer complains about the (sub) object this with a WrongType without any hints. So after this I first continued testing export and import from & to Plone 4, then this doesn't show up.
I have to dig further if this is indeed a difference between the AT/DX serializers.

Then I found out this Plone 4 site has some content id's with spaces: this%20is%20my%20id, aka url encoded. I patched the exporter to search/replace the id and then export the folders. This could go in a global_dict_hook()

But I'm running into a related issue some steps later when I start importing my exported Documents/pages and the relatedItems restore crashes on relations to other items also with %20's in the id's in Archetypes ReferenceEngine. But this could become be a non issue in the end because I don't want to restore content to Plone 4/AT site, but in a Plone5.2/DX.

"relatedItems": [ "http://localhost:9050/plone/path/to/something/with/url%20encodes.doc",
I think it is best in this case to first 'fix' all id's with %20 in the Plone 4 site before starting any migration.

Then in the Plone 4 site the ModifiedDates have a different (invalid/custom?) timezone component that has "+01:00" as a UTC time offset, where %z in collective.exportimport its:

datetime.strptime(modified, "%Y-%m-%dT%H:%M:%S%z")

only seems to work correctly in Python3 according to online resources, but still doesn't like the ":" and the full string date should be u'2011-10-13T13:49:57+0100', not u'2011-10-13T13:49:57+00:00'

from dateutil import parse
parser.parse(modified)

as per Stack Overflow search/copy/paste solves the invalid %z traceback

So for the few hours I spend on this, this looks very promising for less painful small to medium site migrations with mostly default content. . You don't know beforehand what's lurking in pickles in your ZODB from 10-15 years of a site its existence. And you get also very quick round trip on where the issues are when you do an export/import.

The only exception on clear feedback so far being the description problem where these lines in the DX deserializer in plone.restapi hide the real Python exception. Maybe nice for frontends but not for migrations :-/

github.com

plone/plone.restapi/blob/8bb7d8903fd71f230ba831125b687bcce130ad5e/src/plone/restapi/deserializer/dxcontent.py#L56-L60


      
          # Drop Python specific error classes in order to be able to better handle
          # errors on front-end
          for error in errors:
              error["error"] = "ValidationError"
          raise BadRequest(errors)

pbauer · April 19, 2021, 11:40am

Thanks for the great feedback. I also encountered the issue with description. The problem is that atapi.TextField() is also the field used for Richtext. I fixed it like this in a dict_hook during export:

# Text is handled like RichText (a AT issue probably)
for fieldname in ["item", "description"]:
    if isinstance(item.get(fieldname, None), dict):
        item[fieldname] = item[fieldname]["data"]

Since importing to AT is no valid use-case anyway I should probably add that to the default export for the field description.
I never tested importing to Plone 4 and/or Archetypes. @@export_relations supports both but @@import_relations only supports Dexterity.

I use the addon for projects with many custom types and even custom fields (e.g. Achetypes Datagrids with added Raptus Multilingual-Support) that use a weird custom storage for attributes. Using a mix of dict_hooks and obj_hooks during export and import allowed me to solve all use-cases so far.

The only thing that is not possible so far is to import a nested structure where folders are contained within other types that are within folders again. The reason is that importing folders and then other folderish types creates the structure for the remaining content. If a parent for a item is missing it is created as a Folder. But there are probably ways to deal with that issue once I run into it

A thing that is missing is migrating default_page. I fixed that in a project with a dict_hook on export and a upgrade-step that runs after import but export/import of default_page should go into the addon.

Also: Issues and pull-requests for GitHub - collective/collective.exportimport: Export and import content, members, relations and translations are more than welcome!

fredvd · April 25, 2021, 11:12am

So we would need to store the blobpath/oid on the exported json

The plone.restapi deserializer then has to restore the blob on the target object but that is maybe something that has to be added in later after createfactory is called.

And maybe the blob registration needs to be able to be recreated in the zodb when the blob file itself on filesystem is not yet there (ie add something like gracefullblobmissing but without creating a dummy placheholder) Or we would have to make that a requirement.

djay · April 26, 2021, 2:53am

@pbauer If it helps, here is the UI design for where plone.importexport was going to go.

How hard would it be you think to add these functions on top of what you have (obviously some are more important than others)? I trust your code more than whats currently in plone.importexport

Originally it was an action like you currently have it, but after a discussion with @Albert it was switched to a control panel function with the ability to select the path you want to export from/import to. The main reason for this, is that it's really an admin function not a "editor" function. There is the concept of an admin setting up a preset profile which could be available for regular import/exports for editors via an action.

I think supporting csv as an optional additional format to the one you currently have would not be too hard. The code is mostly there for that, just not sure its very memory efficient.

Profile
[News Sync             ] [Load] [Save]
 [ ] Allow contributors to use this profile in the Actions menu
  
____|Export|___|*Import*|____

Import File(s)
- *warning* your import is large so will be done in multiple transactions from the browser. Please don't close your browser during upload. Aborting won't be possible.
- a zip containing files in folder structures with a optional index.csv containing metadata (see format), or a single csv metadata file, or DND a folder here
[ /tmp/myzip.zip          ] [browse]

Primary Key
- Field in import metadata to match
[UUID                   ]

Existing content 
- to replace or update or add into
{query widget}
Path: /news
Creation date: > 1/1/20018

If Content Matched
(o) Update 
( ) Replace 
( ) Rename existing 
( ) Rename new 
( ) skip 
( ) Abort

If Content is New
- relative paths will added to the first path found in the query widget
- Content type field must be specified or use content type in query
(o) Add and create folders
( ) Skip if folders don't exist
( ) Skip 
( ) Abort

If Existing doesn't Match 
( ) Remove  
(o) Skip 
( ) Abort

If more than one Match
( ) Remove all except first
( ) Skip all but the first
(o) Abort

Settings imports
[ ] Users (acl_users.csv)
[ ] Themes (portal_resoures/*)
[ ] Registry (portal_registry.csv)
[ ] Generic Setup (portal_setup/*.xml)


[Import] [Dry Run] [Cancel]

Progress: 423/1024 (20s/43s)

383 Items have been updated 
23 items didn't match and skipped (view...)
2 items were added (view...)
10 updates skipped due to permissions
5 adds skipped due to permissions
[Download log]

and export would look like this

Profile
[News Sync             ] [Load] [Save]
[ ] Allow contributors to use this profile in the Actions menu
    
____|*Export*|___|Import|____

Existing content 
- to export
{query widget}
Path: /news
Path: /other-news
Creation date: > 1/1/20018

Export Contents
[x] Metadata (CSV format)
[x] Files (Zip format)
[  ] Users
[  ] Settings
[  ] Themes 

Metadata to export
- *Warning*: not including Type,UUID can be result in broken plone imports
( ) All (o) Selected fields
[relative path, effective_date, title              ]

[Export] [Dry Run] [Cancel]