I am writing some code for a Dexterity content type.
It has a field called "arquivo", which is a plone.namedfile.field.NamedBlobFile, which would store (tipically) searchable PDF files.
Now I'd like to manipulate the contents of the file a bit before they populate the SearchableText index, so I created my custom SearchableText method in the content type class.
What I would like to do is to use portal_transforms to transform the pdf contents to plain text before I manipulate it.
Here's my code:
if self.arquivo is not None:
transforms = get_tool(name='portal_transforms')
stream = transforms.convertTo('text/plain', self.arquivo.data)
content = stream.getData().strip()
The "content" variable would hold the transformed text. As of now, the content variable is returned by SearchableText, so I can see the results in the portal catalog (via ZMI).
The above code works for plain-text files, but when I try to save the item with a PDF file, I get this message:
AttributeError: 'NoneType' object has no attribute 'getData'
Is there something I am missing here? Please note that the portal_transforms tool has the pdf_to_text and pdf_to_html transforms registered.
you have to debug what's going on inside the
This file has been truncated.
def convertTo(self, target_mimetype, orig, data=None, object=None,
usedby=None, context=None, **kwargs):
"""Convert orig to a given mimetype
* orig is an encoded string
* data an optional IDataStream object. If None a new datastream will be
created and returned
* optional object argument is the object on which is bound the data.
If present that object will be used by the engine to bound cached data.
* additional arguments (kwargs) will be passed to the transformations.
Some usual arguments are : filename, mimetype, encoding
return an object implementing IDataStream or None if no path has been
target_mimetype = str(target_mimetype)
There might be add-ons that does what you want to already (?)
At least, there will probably be some code you can look at, for example:
Updating the issue:
Deleting the whole Plone site, recreating it and installing my add-on apparently solved the problem.
One point to be noted is that when I first created the Plone site in which I made my first tests, poppler-utils where not yet installed, and so portal_transforms didn't have the pdf transforms. Then, when I noticed that, I installed the poppler-utils package, and manually recreated the portal_transforms object. But for some reason there was that error that I pointed in my first post. Recreating the Plone site from scratch solved the issue.
@smcmahon, I suspect that the plonedev.vagrant (at least in tag 4.3.11, which I am using) does not have poppler-utils installed.
Thank you all for your replies.