GSoC 2018 Ideas: Improve Plone's import/export story

This is one post in a series to begin focused discussion about ideas that came out of our 2018 GSoC Brainstorm.

Please use this post as a place to begin to discuss this idea more in depth

This idea was proposed by @jean and both he and @djay have been mentioned as possible mentors

The description:

"Shriyansh Agrawal's @Shriyanshagro GSOC project was Plone import/export. It looks most of the way there, but it's not all done. Finish the job!

In addition, I (Jean) have been working with the Orchard CMS. One of the things I appreciate is the very powerful, flexible and hackable import/export mechanism.

In Orchard, import/export can handle one or more of:

  • site settings (including e.g. features activated,
  • content type definitions,
  • and content.
    Import creates new or modifies existing types and content instances, and is either cumulative or can reset the site to a known state.

It uses an informally structured XML format and clever ID mechanism that's very easy to figure out without knowledge of internals.

I think Plone should steal some of these ideas."

Go on, community! Make this one shine!

2 Likes

@Shriyanshagro has commented here with some ideas for what improvements can been made in a new run of this project.

For the sake of Plone's future: This is quite important and should be 'finished'.

In my opinion, there should be a strong focus on 'not too technical users', so some kind of wizard available directly from the control panel ( including links to docs and maybe a video). [The Big Green Button :slight_smile: ]™

This scares away people (?): https://docs.plone.org/develop/import/index.html

Hi I'm new to open source but I would absolutely love to do this project for GSoC. Coincidentally I did something similar for my personal project which involves dealing with API and parsing JSON/ csv files so I got excited slightly when I saw this thread :stuck_out_tongue: Seems like a fun project to do

@xuan-hh, that's great to hear! Welcome to the Plone community!

If you are really interested in this project, or in working on Plone in general, you've taken exactly the right first step in speaking up. Perhaps you would take a look at the work that was done last year on this topic so you can gain some insight into what the goals were, and what remains to be completed. I believe that @frapell was the mentor for the project last year.

You'll also want to look at this post from last year's GSoC and check out the steps at the top to get started working with Plone. This is a large and long-lived code base, with a lot of corners. You'll want to start exploring as soon as possible so you have some familiarity with what Plone is, what it does, and how it works.

Again, welcome!

I would also like to work on this project for gsoc 2018.

the first thing to be done in plone.importexport is a huge code clean up: we do Python, not Java, neither other languages; I was reading the source code and it really hurts.

we follow PEP 8 and other code conventions defined in our style guide; we sort imports and we don't import things that are not going to use; we use code analysis tools; and we write tests and enforce tests are passing before merging:

https://travis-ci.org/collective/plone.importexport/jobs/313261672

Python has docstrings that must be written below declarations, not above; Python uses CamelCase just for classes, the rest of the stuff must use snake_case unless you have a good reason, or is legacy stuff; so please follow also our naming conventions.

whoever mentors this work this year, please ensure this basic stuff is followed for the sake of everybody.

1 Like

I would say that one of the main goals of this package must be importing content from old sites using the output JSON format generated by collective.jsonify.

why? collective.jsonify seems to be working in Plone from version 2.1 (at least) to 4.3; so, supporting this format out of the box (while getting rid of Trasnmogrifier on most cases) would be a great step on simplifying migrations.

the import process should take care of migration of standard Archetypes-based content types.

on a new iteration we could be able to allow migration of third party content types by mapping their fields, if needed; or maybe generating those new Dexterity-based content types on the fly.

Yes, I too agree with it.

I didn't use transmogrifier or collective.jsonify but python CSV and JSON libs to inter-covert json-csv data.

This addon is built on top of plone.restapi hence the import is taken care by it. However, it lacks import of some content setting like STATE which then I included in plone.importexport exclusively

We had a thought on this idea, but it was way more difficult to implement as there could be number of possibilities while customising content types and vaires from user to user.

yes, I know; IMO the most important use case is migrating from an older Plone version to Plone 5.x.

but that's only my opinion; I know you were working on other use cases.

1 Like

It is an important usecase but isn't it more an integrator use case? who should be comfortable with transmogrifier which is not hard to use to covert jsonify format. Alternatively, converting jsonify to the CSV or other json format should not be hard using some custom python code

Yes, as I already said this addon is built on top of plone.restapi. So the migration limits solely depend upon plone.restapi.
During the development period, I tried migrating Plone v4.0 data to Plone v5.0 and it worked out very well. Though the limits of migration still need a check.

I think it would be great if it was possible to

  1. Install an add on ( collective.something) on both sites
  2. Go to control panel, choose export, choose a few options (like what to export (or what not to export)
  3. Go to control panel on new site, choose import. Select between a few options, like what do to on error (log, stop, continue etc).
  4. Maybe, if necessary, be able to 'make decisions as the migration runs' (there is an error on migrating 'field A', should I migrate the others.... (or manually enter field A).

Yes that was the spirit behind this addon :wink:

Hi! this seems like a great project. I am new to the community but i would like to work on this. I know python, Django, React,etc.

Hi @tulikavijay, and welcome to the Plone community!

If you're interested in working with us for the 2018 GSoC, you'll want to get started by learning a bit about Plone, what it is for, how to use it, and how it works. You've taken the all-important first step of speaking up here in our forum. We've outlined a few useful tips on how to get started and you should work your way through those. We are happy to answer any questions you might have, especially if you can dig a bit and look for answers yourself first.

Again, welcome to Plone! We are glad you're here and we look forward to hearing more from you.

Note there's another new kid on the block:

I asked the author, Ramon Bartl, whether it's a generic Plone solution, and he answered:

senaite.sync is capable to sync two Plone instances as well using plone.jsonapi.routes · PyPI. It can handle custom content types, references and metadata, e.g. the review_history, review_states etc. Currently @juangallostra and @Nihadness are working on senaite.sync to import huge data from a remote site into Senaite. We’re all excited about this project :grinning:
ref

Yes, this project sounds useful. As it would help to establish routes for data transfer in JSON.

But can it serialize/deserialize data into Plone instances?

Is this project still under consideration for GSoC'18?

Yes it is :slight_smile: