Flame alert: the Plone Python 3 upgrade story

zopyx · September 4, 2020, 7:35am

My point: in many case you accumulate a lot of trash over years in your persistent objects. And at some point you want to start with a fresh and clean state. There is a lot of fun you can have with in-place migration with regard to culprits like persistent utilities, references and so on.

I fully agree that an in-place migration is perhaps the better solution when your Plone installation is more or less "clean" and in a solid state.

However almost all Plone migrations that we did over the last years involved changes to code, functionality and structure. And in-place migration usually can not deal with that. So the standard approach here is export-import. I have now worked for almost a year on the University Gent migration (90.000 content objects) where a significant amount of time went into porting of code to Python 3, removing old add-ons and replacing them with decent implementations (either from the community and by new code). We are coming to the end of the project and the final rollout. An in-place migration for this project would have never worked. Starting from scratch with a clean state was the right decision.
The migration approach and code we developed over the last year works properly well. Although, a lot of testing and fixes was needed in particular with the migration of topics to collections and PFG to EasyForm.

pbauer · September 4, 2020, 9:42am

Can we agree on the following?

We need documentation describing the different approaches for migrations that can be used to decide which to choose for any given project.
The docs for in-place migrations should be improved. I've already invested a lot of time into various aspects of these docs over the years but it would be good to assemble the bits and pieces in one place.
The documentation for transmogrifier (https://training.plone.org/5/transmogrifier/) should be improved further. I don't think transmogrifier should become the recommended way for newbies because it is too complex.
plone.importexport should be finished and made compatible with Python 3 and Plone 5.2.2. That could the recommended way for newbies to migrate content only. A way to migrate users could be added to it. It can even be made work for migrations from AT to DX.
More people should publish custom migration code.

I'll go first. I recently wrote a rather simple export/import using restapi that works like a charm:

Export: https://github.com/EUDAT-DPMT/pcp.contenttypes/commit/b5ea62b
Import: https://github.com/EUDAT-DPMT/pcp.contenttypes/commit/ae6294a

I used that approach for two migration from Plone 4.3 to Plone 5.2.2 on Python 3:

To migrate content that was hard to migrate in-place (e.g. ttw-created dexterity content-types) and some weird AT content. Than means I migrated the complete portal in-place but removed some content before the migration and used export/import to to put it back in the old place after the migration.
To import parts of the old content into a new portal. Various hooks during importing allow to modify the items before and after creating and deciding it if should be dropped or changed. In this case I start with a new site and import parts of it into specific places.

I anyone is interested I could clean it up a little and add some more improvements for it in another non-public project) and move it to a small package.

tkimnguyen · September 4, 2020, 12:15pm

this is exactly the summary and recommended path forward I was hoping for

It does indeed sound like there are two ends of the expected complexity spectrum and relatively new or inexperienced site owners should benefit from a guided recommendation.

(When I wrote earlier that we should kill the in-place story, I meant we should document and recommend the other)

My experience has been that simple, mostly uncustomized sites are upgradeable in-place without too much difficulty BUT most sites end up with more customization and functionality. The ZODB upgrade to Py3 scares me...

This upgrade documentation and supporting code/tools should be the focus of a near-future sprint.

tkimnguyen · September 4, 2020, 12:18pm

The reason I still cannot contemplate running WP or letting my friends run WP is stuff like this:

gforcada · September 4, 2020, 4:28pm

An add-on that has a controlpanel where you specify a user/password and a URL.

Once you do that, via REST API, it crawls the specified URL and re-creates the content structure on the calling site who builds it?

ericof · September 4, 2020, 6:14pm

I'm so glad @gforcada volunteered to do it

tkimnguyen · September 4, 2020, 8:27pm

Yeah, I sensed a PLIP, right?

tkimnguyen · September 4, 2020, 8:29pm

But seriously, shouldn't GenericSetup be able to export "everything"? (We know it has limitations because it does not do nested content folders, and we had to implement a similar thing outside of GS) Couldn't there be a (say) JSON export of a site?

ericof · September 4, 2020, 8:34pm

I think, ideally, GenericSetup should export the whole content tree.

michael · September 5, 2020, 3:29am

Fully agree transmofrifier is way too complex, I gave up on several attempts.

Maybe plone.restapi is a better alternative, just has to back port to older plone.

djay · September 8, 2020, 2:51am

plone.restapi currently doesn't have a bulk add/update mechanism which means 1 add/update per request which makes it pretty slow for any largish site and possibly less if you need to do any matching to see if the content exists or not.

plone.importexport's design was to use plone.restapi on the backend and put a bulk ingesting of a zip file with csv and blobs in it, plus some rules about how to match and replace or update any existing content. This was designed to not only sped up the plone.restapi update process but make content updates not require writing code for some scenarios outside of plone to plone content migration. There was even talk of a plone to plone direct conversion control panel like mentioned above sending these zips between running plone instances.
However the GSOC project to finish off the existing code fell through and our own projects where we needed this were put on the hold and we don't have the budget to finish it ourselves.
The code is mostly there however. It just needs a better UI and some more tests I believe. It also currently makes too many assumptions about how existing content is matched.

djay · September 8, 2020, 4:12am

I should add the most feasible method of a plone to plone migration using plone.importexport is a standalone script to convert a jsonify export into a compatible zip file to import. That doesn't exist yet.

tisto · September 8, 2020, 9:07am

Large migrations that involve changes to code, functionality, and structure are inherently complex. I fully agree with Andreas here. No matter if you use in-place or a transmogrifier-based migration approach, it will be complex and it will require in-depth knowledge of the underlying software stack.

Our use case has been mainly mid-to-large-size migrations (> 25 GB data, >100k objects) for years and I couldn't imagine doing this without the flexibility of transmogrifier. Transmogrifier can handle code structures that are unknown to the target system. With in-place you have to get rid of those, as Philip described, which is the main downside IMHO. Transmogrifier itself is dead-simple (at least for a Python programmer familiar with Plone), the complexity lays in the problem we are trying to solve.

Apart from the transmogrifier nitpicking above I fully agree with Philip. Thanks for wrapping things up!

Though, the problem, as always, is not that we don't have a plan what needs to be done but we need people that actually do the work. Philip et al. put a lot of effort into the in-place migration and there is nothing comparable in transmogrifier yet. Therefore, there is no question about what should be the recommended way (-> do-ocracy).

I'd love to see a unified and universally recommended approach with transmogrifier but this is lots of work and cleaning up and documenting a large and complex migration pipeline in transmogrifier is not an easy task.

In addition, even with transmogrifier, there are different approaches and it is not easy to unify those. We started to look into @zopyx's approach of using ArangoDB as a middle-layer in the migration in a recent project, for instance. Though, due to deadlines and project budgets, we had to give up on it. We might give it another shot in the future though. Maybe at some point, we could devote a sprint on this topic...

zopyx · September 8, 2020, 9:36am

I plan to polish the stuff that I've written for the University Gent migration later this fall. Some additions to collective.jsonify have been backported already to the main repository. The importer part is in most cases working out of the box for the standard types...of course, every project is different and there is always need to touch the migration code. Migrations will never work out-of-the box unless you have an unmodified Plone site

fredvd · September 8, 2020, 10:33am

More people should publish custom migration code.

I have watched @zopyx huge efforts for the University of Ghent project. Andreas can expand on this, but there is a large 'long tail' of attributes and parameters on content objects for the many different features we have and 'have had' in Plone 'the application' for generic CMS functionality. default_view, layouts, content rules, constrain_types, permissions, portlets, etc.

(I'm leaving out the custom content types for the moment. And there's also the site settings, user/groups and other control panel config.)

Depending on if you have used that core Plone functionality in the past you will or will not notice losing those settings in an ETL migration until you start testing the results in the destination site. But that only starts working when the theming and migration of the individual content types also has been done, the destination site has to be in place before you can start testing.

And considering the age of the Plone site you now migrate to 5.2, these attributes can also contain 'legacy' values from previously in place migrated core Plone 2,3,4 versions which are sometimes hard to recognise, or values that work only for your (old) customisations/add'ons. On top of that, our friend Acquisition can trick you when you query the attributes in the extract phase and fetch non existing attributes from parents. .

The Ghent migration part has had quite some iterations to figure out the different attributes. A few times we were debugging issues in the theming, or a custom content type, only to find out that there was a special case (in custom code) which depended on a not migrated attribute. Rinse and repeat.

The end result with a working migration profile/code, be it in transmogrifier or Andreas configuration/setup it's the combination of configuration for core plone attributes, add'ons and customisations, for the Plone features used in that particular project.

If those can be split, we can work together on making the core Plone attributes/content types as complete as possible and have a clear configuration split in those pipelines where builders of a migration tool can put the custom code, it will be much easier to publish and re-use pipelines.

But that would take quite some more effort, as described above and cleaning up your customised pipe line of one Plone site and still knowing which blocks did what after many months working on a migration story is nobody's favorite community effort.

Another aspect is that a migration project always takes too long and costs too much because of 'unforeseen', so if you know you don't need portlets, multilingual or content rules, or users because of ldap, you not only skip those, but your other code also doesn't need take it into account.

We have used ETL for one large project at Zest, @mauritsvanrees adapted transmogrifier to our needs for 18 very similar sites, where all content was extracted in single large json files per Plone site and then could be loaded again. One special feature there is that the extract phase analysed which Images and Files were actually used in Documents, and migrated only those that were referenced to clean up the target site. Worked. Sort of

djay · September 8, 2020, 10:44am

or unless they are willing to accept just migrating simple content and users and starting again with everything else... which a lot of people might be ok with. An upgrade is a good time for a new theme and a lot of plugins have to be replaced anyway so their settings have to be redone.
I think there could be selection bias going on because I think we are too focused on the jobs we are lumped with as professionals which is make everything work perfect including a migration 100% lossless migration.

erral · September 8, 2020, 11:28am

In all of our latest migration cases from 4 to 5 we are doing a full jsonify export and transmogrifier import. Usually the site goes with a restyling and those 4 sites are not diazo-based sites, so we would be doing the theming again, so we did not even try to migrate those sites in-place.

Every migration is different from any other one, but we tried to create a base transmogrifier, and created a bobtemplate to create it. We usualy start with with that template and then add the custom parts: https://github.com/codesyntax/bobtemplates.cs/tree/master/bobtemplates/cs/cs_migration/

cekk · September 8, 2020, 12:23pm

Probably this is one of the best threads ever
I have to do a Plone migration more or less every month and i like to see how other people handle this.

IMO there isn't a silver bullet solution for everyone and the best idea could be document all the "official" possibilities that people have. And why not add some useful custom ones.

For example if you want to migrate a Plone site to use Volto, the basic in place migration isn't enough because you also need to convert text into draftjs-compatible blocks.

We migrated a LOT of plone sites in the past from Plone 3/4 to 5 and now from Plone 4/5 to python3 and Volto.

Most of them had a lot of customizations and we decided to not use in place migration for the same reasons: we took this as an opportunity to start with a fresh site and don't want to deal with possible ZODB problems.

We (obviously) used transmogrifier to do this. Probably it's not the fastest solution, but we ended up with a set of (extensible) products that perfectly matches our needs:

These tools (like other ones i suppose) are basically only contents migrators (also users in our case). In our experience we didn't needed to migrate site configurations, wf or anything else because usually all site configurations/customizations are set in our policy products.
Same as for portlets because every migration was an opportunity to add a new skin and a site structure rethinking for customers.

tkimnguyen · September 14, 2020, 5:28pm

A post was split to a new topic: Unified installer Python version

pbauer · March 28, 2021, 10:42am

I've released a package that contains all the code we used in recent migrations that we did not do in-place:

Features

Export & Import content
Export & Import members and groups with their roles
Export & Import relations
Export & Import translations
Export & Import local roles
Export & Import order (position in parent)

Export supports:

Plone 4, 5 and 6
Archetypes and Dexterity
Python 2 and 3
plone.app.multilingual, Products.LinguaPlone, raptus.multilanguagefields

Import supports:

Plone 5 and 6
Dexterity
Python 2 and 3
plone.app.multilingual