Portal_transforms usage

alberto · December 21, 2016, 8:58pm

I am writing some code for a Dexterity content type.

It has a field called "arquivo", which is a plone.namedfile.field.NamedBlobFile, which would store (tipically) searchable PDF files.

Now I'd like to manipulate the contents of the file a bit before they populate the SearchableText index, so I created my custom SearchableText method in the content type class.

What I would like to do is to use portal_transforms to transform the pdf contents to plain text before I manipulate it.

Here's my code:

    if self.arquivo is not None:
        transforms = get_tool(name='portal_transforms')
        stream = transforms.convertTo('text/plain', self.arquivo.data)
        content = stream.getData().strip()

The "content" variable would hold the transformed text. As of now, the content variable is returned by SearchableText, so I can see the results in the portal catalog (via ZMI).

The above code works for plain-text files, but when I try to save the item with a PDF file, I get this message:

AttributeError: 'NoneType' object has no attribute 'getData'

Is there something I am missing here? Please note that the portal_transforms tool has the pdf_to_text and pdf_to_html transforms registered.

hvelarde · December 22, 2016, 11:14am

you have to debug what's going on inside the TransformEngine module:

github.com

plone/Products.PortalTransforms/blob/2.1.12/Products/PortalTransforms/TransformEngine.py#L80-L181


@security.public
def convertTo(self, target_mimetype, orig, data=None, object=None,
              usedby=None, context=None, **kwargs):
    """Convert orig to a given mimetype


    * orig is an encoded string


    * data an optional IDataStream object. If None a new datastream will be
    created and returned


    * optional object argument is the object on which is bound the data.
    If present that object will be used by the engine to bound cached data.


    * additional arguments (kwargs) will be passed to the transformations.
    Some usual arguments are : filename, mimetype, encoding


    return an object implementing IDataStream or None if no path has been
    found.
    """
    target_mimetype = str(target_mimetype)

This file has been truncated. show original

espenmn · December 22, 2016, 11:50am

There might be add-ons that does what you want to already (?)

At least, there will probably be some code you can look at, for example:

(for example: https://github.com/collective/collective.documentviewer/blob/master/collective/documentviewer/catalog.py ?)

alberto · December 22, 2016, 1:54pm

Updating the issue:

Deleting the whole Plone site, recreating it and installing my add-on apparently solved the problem.

One point to be noted is that when I first created the Plone site in which I made my first tests, poppler-utils where not yet installed, and so portal_transforms didn't have the pdf transforms. Then, when I noticed that, I installed the poppler-utils package, and manually recreated the portal_transforms object. But for some reason there was that error that I pointed in my first post. Recreating the Plone site from scratch solved the issue.

@smcmahon, I suspect that the plonedev.vagrant (at least in tag 4.3.11, which I am using) does not have poppler-utils installed.

Thank you all for your replies.