[Solved] Removing html.escape?

I had to monkeypatch plone.app.layout.viewlets.common.TitleViewlet.page_title to remove the use of html.escape that resulted in weird html-quoted titles in the browser. Is html.escape still required (in 2022)?

IIRC it is to avoid CSRF. But proof me wrong.

I did not know. I just read that escaping is useful to avoid CSRF for situations with submitted data (forms,xhr) , but for things like a title tag I doubt it's useful.

Title is in a list of "Safe HTML Attributes":
https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
Does it mean that title tags should be escaped, or because they are "safe", no escaping is required?

There's an example on how to abuse the title tag;
https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Evasion_Cheat_Sheet.html
Maybe sanitizing the text of a title tag would be a better method than escaping it?

Sanitizing is always error prone. But maybe our safe_html transform is good enough?
Best open an issue and ping the security team for an opinion.

From what I see in plone.app.layout used in Plone 5.2, we escape the page title in python code, but then in the title.pt template use structure to show it. This should mean that a & in the title gets escaped to & in Python but gets turned back to & in the template. This should mean it is safe against any nastyness, but also shows the title as you would want it.

There may be other places where this is not the case, and which may be tricky to fix without reintroducing security problems. See this bug report:

I chose to use SafeHTML().scrub_html() from Products.PortalTransforms.transforms.safe_html as a sanitizer, it should be good enough. More advices are welcome.

Edit: I switched to BeautifulSoup for sanitizing, using get_text().