I am writing some code for a Dexterity content type.
It has a field called "arquivo", which is a plone.namedfile.field.NamedBlobFile, which would store (tipically) searchable PDF files.
Now I'd like to manipulate the contents of the file a bit before they populate the SearchableText index, so I created my custom SearchableText method in the content type class.
What I would like to do is to use portal_transforms to transform the pdf contents to plain text before I manipulate it.
Here's my code:
if self.arquivo is not None:
transforms = get_tool(name='portal_transforms')
stream = transforms.convertTo('text/plain', self.arquivo.data)
content = stream.getData().strip()
The "content" variable would hold the transformed text. As of now, the content variable is returned by SearchableText, so I can see the results in the portal catalog (via ZMI).
The above code works for plain-text files, but when I try to save the item with a PDF file, I get this message:
AttributeError: 'NoneType' object has no attribute 'getData'
Is there something I am missing here? Please note that the portal_transforms tool has the pdf_to_text and pdf_to_html transforms registered.
Deleting the whole Plone site, recreating it and installing my add-on apparently solved the problem.
One point to be noted is that when I first created the Plone site in which I made my first tests, poppler-utils where not yet installed, and so portal_transforms didn't have the pdf transforms. Then, when I noticed that, I installed the poppler-utils package, and manually recreated the portal_transforms object. But for some reason there was that error that I pointed in my first post. Recreating the Plone site from scratch solved the issue.
@smcmahon, I suspect that the plonedev.vagrant (at least in tag 4.3.11, which I am using) does not have poppler-utils installed.