Outputfilters, PortalTransforms, and html filtering

flipmcf · July 20, 2021, 5:56pm

Hey Folks. This is another deep-dive for me.

I've been inside Products.PortalTransforms and plone.outputfilters before, but this one is stumping me.

We have a media button in TinyMCE that places a video into a rich text field. Nothing Fancy.
We stick some HTML in there for the user.

<p> blah blah blah </p>
<p><video width="300" height="150">
<source src="https://www.rfa.org/english/video?v=1_3g5xp97h" /></video></p>
<p>blah blah blah</p>

plone.outputfilters handles the rest. The solution is identical to 'resolve_uid_and_caption' except we grab that video tag and replace it with the real video.

Things worked fine until we HAD TO start using javascript to embed the video.

HTML filtering is a good thing. We do not want users typing javascript into the source of a rich text field and having it rendered. We don't want to turn that off. 'script' is not in the allowed tags list, and probably will even remain in nasty tags.

To embed the video with javascript, we are maintaining that javascript with a view and template. This is just like what you find in plone.outputfilters/browser/configure.zcml . The javascript goes into video_output.pt and gets all it's info/options/variables from the filter adapter.

<browser:page
        name="video_output"
        for="*"
        class=".video_output.RichTextVideoView"
        template="video_output.pt"
        permission="zope.Public"
        layer="mycompany.myclient.interfaces.IMyClientsVerySpecialLayer"
    />

plone.outputfilters is, IMO, amazingly built and very flexable.

This is fantastic except for one annoying part - plone,outputfilters is a transform POLICY. It's not actually part of a transform chain.

As the docs say: plone.outputfilters · PyPI

plone.outputfilters hooks into the PortalTransforms machinery by installing:

a new mimetype (“text/x-plone-outputfilters-html”)

a transform from text/html to text/x-plone-outputfilters-html

a null transform from text/x-plone-outputfilters-html back to text/html

a “transform policy” for the text/x-html-safe mimetype, which says that text being transformed to text/x-html-safe must first be transformed to text/x-plone-outputfilters-html

The filter adapters are looked up and applied during the execution of the transform from step #2.

This should be considered an implementation detail and may change at some point in the future.

What this means is that plone.outputfilters is run BEFORE safe-html. Any javascript added from a plone.outputfilters template is removed by the safe-html transform.

We are doing 'text/html' -> [required transforms: (outputfilters)] -> 'text/x-html-safe' any code from outputfilters is not safe and removed.

I would like to argue that html generated by outputfilters IS SAFE and does not need filtering.

(yea, a plugin could write something to inject unsanitized input through an output filter, but that's a self-inflicted injury at that point I think)

If somehow, plone could do a transform like 'text/html' -> 'text/x-html-safe' -> 'text/x-html-safe-and-outputfiltered' and remove outputfilters as a policy we could make this work for all of plone.

I don't know how to setup transform chains. never played with them. not sure if they would help here or not.

The above is going to probably take a PLIP and a rewrite of a fair chunk of outputfilters.

Any other more hacky thoughts on how to accomplish the requirement?

Rich Text Fields need to be sanitized with html filters
plone.outputfitlers replacements in Rich Text Fields should not be filtered

Thanks for applying a few of your neurons on this.

1letter · July 20, 2021, 6:11pm

I use a custom transform of output

flipmcf · July 22, 2021, 3:08pm

I see. This makes sense.

I am mad because I have a lot of work to do to move something from outputfilters into plone.transformchain. The implementation is much bigger than I described.

To state it simply, we request different mimetypes for richtext based on the request. If we see a google amp request, for example, we apply the IGoogleAmpRequest interface maker to the request and treat it as a browser layer. That layer requests the text field with a different output mimetype and we trigger a different outputfilter. I really should write this model up separately. I'm a fan of it.

It's less work for us to find a way to make outputfilters run after the safe_html transform. I'm still hoping for that. This would be a change to plone.outputfilters, tho.

I'm not opposed to moving this work to plone.transformchain - I'm certain I can accomplish the same thing when the transformchain looks up named browserview adapters to get views and templates like outputfilters does. I just wish there was a simpler solution without a re-write.

plone.outputfilters seems to hint that it expected such changes in the future:

I think the future is now.

cdw9 · July 26, 2021, 12:55pm

We have a site that uses uwosh.snippets · PyPI for this sort of thing. In addition to the add-on, we created a custom HTML Snippet content type with a text area for javascript, and the type is restricted so only site admins are allowed to add it. Content editors are then able to add the snippet into their pages.

I think I also had to customize uwosh.snippets a little bit to make it work, and I don't know if it's compatible with Plone 5.2 (I haven't tried it). But I can get you more information if you are interested.