Content Import and Export

Hello @djay, @ebrehault, @cewing and @datakurre. Let this thread be for the discussion of GSoC'17 Project Idea - "Content Import and Export"

I'm Shriyansh Agrawal, a GSoC aspirant interested in this project.

Summary

Strategically, good content import/export is important because

  • it allows new users to get started quickly.
  • It also helps overcome an obstacle when users expect a SQL database to be able to import and export from.
  • It makes it easier for regular bulk uploads or syncs of content from external sources. It allows this for non technical users and non python developers.

The end goal is to make plone more approachable for webmasters which will in turn help grow the install base.

Technical requirements with online UI

  1. Permissions and security to be respected. Lower roles can still use it just with content/fields they have access to.
  2. Both CSV and JSON import and export of content (using seperate files for binary/content). CSV is included so non technical users can update metadata. Both formats will able to hold the same data, CSV will need to use quoted JSON for certain parts.
  3. Help when imports go wrong, e.g dry-run mode, reports on content created, skipped etc.
  4. It will allow for both object creation as well as finding and updating existing content via various unique attributes such as path or custom fields.
  5. It will work for metadata, content and binary content or just combinations of these (ie just metadata refresh if required).
  6. It should make it possible to export content and then reimport it into a new site with a different version such that almost all data is retained..

Implementation

It will be implemented as an add-on or extend an existing add-on, that can be incorporated into Plone as at a later date. collective.importexport is example of existing add-on that can be extended.

Skills

Mainly python. Some UX skills to help create an intuitive UI but this can be provided by the mentor.

Can someone please elaborate this need??[quote="Shriyanshagro, post:2, topic:3789"]
Permissions and security to be respected
[/quote]

I guess this could be handled with some existing Auth Add-on of Plone?

Can mentors put some light upon this and suggest some dry-run tools??[quote="Shriyanshagro, post:2, topic:3789"]
import it into a new site with a different version such that almost all data is retained
[/quote]

I didn't get this :confused: what do you mean by different version??[quote="Shriyanshagro, post:2, topic:3789"]
for both object creation as well as finding and updating existing content
[/quote]

For object manipulation I have started reading the ZODB documentation, anything else do I need to look into??

Happy to see this Add-On already exist and now it would be best to extend this add-on with existing PLone's Authorization Add-On to meet the requirements.

We have Transmogrifier for the purpose of migrations where imports and exports are important.
The common accepted format here is JSON for moving data between platform. I have seen little need for export/imports on the SQL level during 15 years of Plone and Plone migrations. SQL connectivity might be a transmogrifiert step, nothing more, nothing less.

-aj

@zopyx please read the context before replying. This is specifically about the GSOC project relating to provide a more end user friendly mechanism to replace transmogrifier.

Users who want to export or import into plone tend to expect either a) some kind of import/export feature built in b) that plone uses SQL and therefore they can hack the sql to get at content. Our aim is to reduce requests for support of the community for how to import/export.

What this means is that if someone doesn't have access to view a document then export shouldn't let them export it. If they can't edit an object, then import shouldn't let them import it. If they can't change sharing settings then import shouldn't let them change sharing etc etc.

This just means the future code will give help if imports go wrong, eg. you specify a import to overwrite certain content by specifying paths but the paths don't match. A dry run feature is just one way to achieve this.[quote="Shriyanshagro, post:3, topic:3789"]
import it into a new site with a different version such that almost all data is retained

I didn't get this :confused: what do you mean by different version??
[/quote]

Import from Plone 4 into Plone 5 etc.[quote="Shriyanshagro, post:3, topic:3789"]
for both object creation as well as finding and updating existing content

For object manipulation I have started reading the ZODB documentation, anything else do I need to look into??
[/quote]

Probably all the object manipulation can be handled by plone.restapi.

2 Likes

hello @djay and @ebrehault!
Well, I'm Kumar Akshay and highly intersted in this project.
Please pardon me, as i think i'm quite late for introductory part, My mid-semester exams was going on, though i promise it'll be not the case if i get the chance to participate and contribute to Plone.
I've read the requirements and goal for this task and i've done few projects related to this domain. So I think i can handle this project though I'm sure that things will not go in my way but i'm kind of confident that this will be in control.
Since i've never used this platform for the means of communication, I'm not sure where to talk to you guys in person.
Also i must acknowledge that i've no prior experience in GSoc but i've learned most of the necessary things in the process when i was planning to contribute to Mozilla last year. I'm in second year of my Bachelor course. Also I'm from India and i've no problem with the mismatching of timezone.
I really hope to get along with repository little quicker and hope to have support from you guys too.

Regards,
Kumar Akshay
k.akshay9721@gmail.com


https://www.linkedin.com/in/kumar-akshay-2370ab101/

Thanks for replying back to my queries.
Currently, I am looking through Plone's Development Documentation.
Also, I have looked through some easy labelled issue, and helped to successfully close one too :slight_smile:
I'll try my best to have a keen understanding of plone.collective.importexport and plone.restapi asap, and will try to resolve some issue as well.

@djay and @ebrehault are there anything else I should look into for this project or for Plone contribution??

1 Like

Are we going to extend collective.importexport add-on or implement a new add-on for this purpose?

https://docs.plone.org/develop/plone/content/importexport.html
After reading this doc I've few questions
Firstly what's the problem with collective.importexport? I guess it's incomplete now as few functions were yet to implement. Is it?
There was a description about Simple JSON export which used base64.b64decoder and encoder and that supports binary data export. So the question is why not use this export.py script as this method works with majority of plone (3.6+)?
@djay @ebrehault

This error occurred while building collective.importexport
http://pastebin.com/pv3XUuEZ
@djay and @ebrehault can you have a look at it??

@Shriyanshagro

In general, an error like this one indicates that some other package has a requirement for the collective.z3cform.datagridfield package that it be exactly version 1.1. But collective.importexport requires that the same package have a version greater than 1.1.

In order to find out what package is requiring this version, you can run bin/buildout annotate which will not actually attempt to install anything, but will produce output that shows the configuration in use by all your various buildout sections. This can help you to find what is requiring the version of collective.z3cform.datagridfield to be precisely 1.1.

Since the command prints to standard output, you can also redirect the output to a text file for more easy searching:

bin/buildout annotate > annotations.txt

Hope this helps.

c

@cewing Thanks for looking into the issue.
But this actually didn't solve the problem. Here even I realize the constraint problem. But the need of hour is how to overcome this issue??
Two packages require the same egg with different version??

In versions.cfg I have removed the line -
"collective.z3cform.datagridfield = 1.1"
And surprisingly this hack works, and the build successfully gets installed on my testing site. :slight_smile:
@cewing What possible error could this hack creates in future?? Or this is what you were suggesting to me??

@Shriyanshagro that is how you can figure out and fix the underlying problem

This version pin is the probably the version from Plone itself. If the site starts without that pin in place, it may be that it is just fine. I would suggest that rather than removing the version in versions.cfg entirely, you replace it with a new version pin, picking the version closest to 1.1 which still fulfills the constraint of being >1.1 which is what collective.importexport requires. Looking at the Python Package Index, this would appear to be version 1.2. This should help to reduce the chance of problems.

1 Like

Yes, after removing the line from version.cfg it itself get moved to version 1.2.
But anyway I would again add this line with version pin 1.2, just to avoid any future occurrence of the problem. :wink:

1 Like

exactly. Good work!

1 Like

Does collective.importexport supports Archetypes or just Dexterity??
Also how to judge the same for a given Add-on??

There are many things it doesn't do.
a) it doesn't support json in addition to csv
b) it doesn't support all object data and metadata like ownership etc
c) it doesn't support blobs or html content ie primary fields. so you can't upload a zip file or a directory have have content updated/added based on that.
d) because of the above it can't be used to migrate data from one site to another.
e) It's UI could be easier

It only supports dexterity. that is probably ok for this project too. Plone will eventually drop support for Archetypes.

Can you explain what this means?

@djay, I think @Shriyanshagro means how can one determine if an Add-on supports Archetypes, Dexterity, or both?

1 Like

Yes, that's what I mean