Does Plone have a way to parse data being uploaded?

Hello,

Years ago I had a product that allowed me to automate creating new content from Word docs that I uploaded.

One of the components I used was a parser called hachoir. (hachoir · PyPI).

Hachoir, parsed the file being uploaded (a Word document) and read the metadata assigned during document creation. this metadata was used to create a new instance of a content type.

Question Since this was a long time ago, I was wondering if Plone had a built -in way to parse (read the data stream) of uploads?

Thankis

Wayne Glover via Plone Community wrote at 2023-6-12 22:23 +0000:

Years ago I had a product that allowed me to automate creating new content from Word docs that I uploaded.

One of the components I used was a parser called hachoir. (hachoir · PyPI).

Hachoir, parsed the file being uploaded (a Word document) and read the metadata assigned during document creation. this metadata was used to create a new instance of a content type.

Question Since this was a long time ago, I was wondering if Plone had a built -in way to parse (read the data stream) of uploads?

What you did before will still (essentially) be possible in a modern Plone.

I doubt very much that stock Plone will know how to create
content objects from Word document metadata in the way you need it.
Thus, something of your own will be necessary.

Modern Word documents (i.e. docx documents) use an XML based
representation (rather than a binary one).
You might be able to use a standard parser
to extract data (e.g. lxml with its built in XSLT support).

python-docx might help in your use case.

See the Document's property core_properties.

thanks folks. the parser i used is still maintained and current so I'll stick with that.

Appreciate your thoughts.

Just in case 'syntax is misunderstood' here:

For some things, getting content from Word into Plone does not really require 'parsing'.
If it is mostly plain text, I have been using pandoc ( https://pandoc.org/ ) to convert the text to markdown in Plone.
It has been working fairly well (at least, much better than 'using html').

I have been using plone.api.create from the command line, but I assume it is not too difficult to do it from a browser view (import)