Portal transform on rendered content

Hello, Every 01!

I was trying to write a transform that would act on rendered content, but I was unable to identify the Interface(s?) that I would have to register against to get only the rendered content of a Plone page. Here's what I tried:

test_transform.py

from zope.component import adapter
from zope.interface import implementer
from zope.interface import Interface
from bs4 import BeautifulSoup
from plone import api
from plone.transformchain.interfaces import ITransform
import logging
from collective.bbcodesnippets.interfaces import IBBCodeSnippetsLayer

logger = logging.getLogger(__name__)


@implementer(ITransform)
@adapter(Interface, IBBCodeSnippetsLayer)
class TestTransform(object):
    order = 1000

    def __init__(self, published, request):
        self.published = published
        self.request = request

    def transformBytes(self, result, encoding):
        logger.info("transformBytes called for %r", self.published)
        return self._transform(result, encoding)

    def transformUnicode(self, result, encoding):
        logger.info("transformUnicode called for %r", self.published)
        return self._transform(result, encoding)

    def transformIterable(self, result, encoding):
        logger.info("transformIterable called for %r", self.published)
        return [self._transform(s, encoding) for s in result]

    def _transform(self, html, encoding):
        res = html
        return html.upper()

test_transform.zcml

<configure xmlns="http://namespaces.zope.org/zope"
           xmlns:browser="http://namespaces.zope.org/browser"
           xmlns:plone="http://namespaces.plone.org/plone">

<adapter 
    name="test_transform" 
    factory=".test_transform.TestTransform"
    />

</configure>

The @adapter(...) was a pure shot in the dark, because I have no idea which interface I might actually need to define here in order to get all HTML, like BBCode Snippets seemingly do...

Can anyone show me how to do this in Plone 6.x?
Gg

Gogobd via Plone Community wrote at 2023-4-20 15:25 +0000:

I was trying to write a transform that would act on rendered content, but I was unable to identify the Interface(s?) that I would have to register against to get only the rendered content of a Plone page.

You cannot use an interface to select "rendered content".

Would you are looking for is not a "portal transform" but an
output transform. Those transforms get activated after the normal
request response has been produced and can modify the response.
Thus, they see the rendered result.

I suggest you try to find corresponding documentation.
An example usage is Plone's CSRF (= "Cross Site Request Forgery") protection
(--> plone.protect).

Thats what i do in Plone 6 Classic UI:

<configure
    xmlns="http://namespaces.zope.org/zope"
    xmlns:genericsetup="http://namespaces.zope.org/genericsetup"
    xmlns:i18n="http://namespaces.zope.org/i18n"
    xmlns:plone="http://namespaces.plone.org/plone"
    i18n_domain="my.addon">
  <adapter
      name="transforms.remove_script_tag"
      for="* *"
      factory=".transforms.RemovePloneProtectScriptTransform" />
</configure>
# -*- coding: utf-8 -*-
# transforms.py
from plone.transformchain.interfaces import ITransform
from zope.interface import implementer


@implementer(ITransform)
class RemovePloneProtectScriptTransform(object):

    order = 10000

    def __init__(self, published, request):
        self.published = published
        self.request = request

    def transformBytes(self, result, encoding):
        return result

    def transformUnicode(self, result, encoding):
        return result

    def transformIterable(self, result, encoding):
        actual_url = self.request.get("ACTUAL_URL")
        if actual_url and "++plone++my.addon/templates" in self.request.get(
            "ACTUAL_URL"
        ):
            for node in result.tree.xpath("//script"):
                node.getparent().remove(node)
        return result
from lxml import html

from repoze.xmliter.serializer import XMLSerializer
from repoze.xmliter.utils import getHTMLSerializer

from zope.component import adapter
from zope.interface import Interface
from zope.interface import implementer

from plone.transformchain.interfaces import ITransform

from mypackage.theme.interfaces import IMyPackageLayer


@implementer(ITransform)
@adapter(Interface, IMyPackageLayer)  # any context, IMyPackageLayer request layer
class OptimizeTransform(object):
    """Fiddle with the HTML output before sending to client."""

    order = 99999  # run last

    def __init__(self, published, request):
        self.published = published
        self.request = request

    def transformBytes(self, result, encoding):
        """Copied from plone.protect.auto.ProtectTransform"""
        result = result.decode(encoding, "ignore")
        return self.transformIterable([result], encoding)

    def transformString(self, result, encoding):
        """Copied from plone.protect.auto.ProtectTransform"""
        return self.transformIterable([result], encoding)

    def transformUnicode(self, result, encoding):
        """Copied from plone.protect.auto.ProtectTransform"""
        return self.transformIterable([result], encoding)

    def transformIterable(self, result, encoding):
        """Apply the transform"""

        # primitive mobile check
        is_mobile = False
        ua = self.request.environ["HTTP_USER_AGENT"]
        if ua:
            if "Mobile" in ua or "Tablet" in ua or "Mobi" in ua:
                is_mobile = True

        # only modify HTML
        contentType = self.request.response.getHeader("Content-Type")
        if contentType is None or not contentType.startswith("text/html"):
            return

        # don't modify compressed content
        skip = ["gzip", "deflate", "compress"]
        contentEncoding = self.request.response.getHeader("Content-Encoding")
        if contentEncoding and contentEncoding in skip:
            return

        if result:
            if not isinstance(result, XMLSerializer):
                result = getHTMLSerializer(result, pretty_print=False)
                result.serializer = html.tostring

            on_management_page = result.tree.xpath(
                "//link[contains(@href, 'manage_page_style.css')]")

            if on_management_page:
                return

            body = result.tree.find("body")

            ...

        return result

Please note that getHTMLSerializer is important to ensure well-formed HTML. Sometimes the output from Plone (such as that from the safe_html transform) is serialized as XML, which can introduce self-terminating tags making the output invalid HTML, for example <script src=""></script> becoming <script src="" /> or <iframe src=""></iframe> becoming <iframe src="" />

2 Likes

David via Plone Community wrote at 2023-5-17 22:35 +0000:

...
@implementer(ITransform)
@adapter(Interface, IMyPackageLayer) # any context, IMyPackageLayer request layer
class OptimizeTransform(object):
...
def transformString(self, result, encoding):
"""Copied from plone.protect.auto.ProtectTransform"""
return self.transformIterable([result], encoding)

def transformUnicode(self, result, encoding):
"""Copied from plone.protect.auto.ProtectTransform"""
return self.transformIterable([result], encoding)

def transformIterable(self, result, encoding):
"""Apply the transform"""

The code above and the method name suggest that result is an iterable,
likely a list and not a string.

   ...
   # skip ZMI
   if "manage_page_style.css" in result:

This looks wrong: apparently, it assumes that result is a string.

Thank you for pointing that out, I edited my post to remove that part.