When uploading MS Office docs, I would like Plone to automatically set the embedded cover page image as the document lead image and copy over Dublin Core document metadata.
I am hoping to accomplish the extraction by first extending openxmllib and perhaps Products.OpenXml.
Then, to store the document, I am thinking a Dexterity content type (say, "Document") based on plone.app.contenttypes File (with IDublinCore & ILeadImage behaviors enabled) would be sufficient. From there, I assume that I could subscribe for IObjectCreatedEvent to populate the lead (cover) image & metadata fields. At least initially, all the fields with copied-over data would be read only.
Are there some better ways or pieces I am missing?
It would be nice if one could upload multiple documents at once. I seem to remember Plone5 has this feature built-in? Even better, I would like this to support bulk file management via WebDav. However I found out that when uploading files to Plone (4.3) via WebDav, file names with non-ASCII characters seem to always get rejected. Is that a known problem or am I to blame my WebDav client (ExpanDrive on OSX) for that? Any existing solutions?
Later, it would of course (?) make sense for a Document type such as described here to have pluggable support for any document type (PDF to start with). If someone with more experience would suggest a design for that, I'll be glad to use it. I can put in an interface & overridable adapter for cover & DC metadata extraction but I am guessing that is not enough. Or perhaps it is, if indexing & mime type support etc. would be provided by other third-party packages, similar to how Products.OpenXml provides those for MS Office XML docs?
Any comments & suggestions appreciated.