Years ago I had a product that allowed me to automate creating new content from Word docs that I uploaded.
One of the components I used was a parser called hachoir. (hachoir · PyPI).
Hachoir, parsed the file being uploaded (a Word document) and read the metadata assigned during document creation. this metadata was used to create a new instance of a content type.
Question Since this was a long time ago, I was wondering if Plone had a built -in way to parse (read the data stream) of uploads?
Wayne Glover via Plone Community wrote at 2023-6-12 22:23 +0000:
Years ago I had a product that allowed me to automate creating new content from Word docs that I uploaded.
One of the components I used was a parser called hachoir. (hachoir · PyPI).
Hachoir, parsed the file being uploaded (a Word document) and read the metadata assigned during document creation. this metadata was used to create a new instance of a content type.
Question Since this was a long time ago, I was wondering if Plone had a built -in way to parse (read the data stream) of uploads?
What you did before will still (essentially) be possible in a modern Plone.
I doubt very much that stock Plone will know how to create
content objects from Word document metadata in the way you need it.
Thus, something of your own will be necessary.
Modern Word documents (i.e. docx documents) use an XML based
representation (rather than a binary one).
You might be able to use a standard parser
to extract data (e.g. lxml with its built in XSLT support).
For some things, getting content from Word into Plone does not really require 'parsing'.
If it is mostly plain text, I have been using pandoc ( https://pandoc.org/ ) to convert the text to markdown in Plone.
It has been working fairly well (at least, much better than 'using html').
I have been using plone.api.create from the command line, but I assume it is not too difficult to do it from a browser view (import)