Error with international characters in rich text widgets

I am working on upgrading different add-ons to python 3 / 5.2.
When working on collective.z3cform.colorpicker I discovered that I get error on rich text field for some content if the field contains international character (ÆØÅ).

How do I troubleshot this?

2019-03-05 15:41:32 ERROR Zope.SiteErrorLog 1551796892.540.618312781417 http://localhost:8086/Plone/asdfasdf/@@edit-tile/collective.themefragments.fragment/90a6808417d34569946b11312660b87a
Traceback (innermost last):
  Module ZPublisher.Publish, line 138, in publish
  Module ZPublisher.mapply, line 77, in mapply
  Module ZPublisher.Publish, line 48, in call_object
  Module plone.z3cform.layout, line 63, in __call__
  Module plone.z3cform.layout, line 57, in update
  Module plone.app.tiles.browser.base, line 105, in render
  Module z3c.form.form, line 162, in render
  Module zope.browserpage.viewpagetemplatefile, line 49, in __call__
  Module zope.pagetemplate.pagetemplate, line 137, in pt_render
  Module five.pt.engine, line 98, in __call__
  Module z3c.pt.pagetemplate, line 163, in render
  Module chameleon.zpt.template, line 261, in render
  Module chameleon.template, line 171, in render
  Module 8ddcaf7be27ded16472cf2c090a77ead.py, line 91, in render
  Module 44a2fe2566492b4630348325426a380b.py, line 1826, in render_titlelessform
  Module 44a2fe2566492b4630348325426a380b.py, line 451, in render_fields
  Module 44a2fe2566492b4630348325426a380b.py, line 126, in render_widget_rendering
  Module 44a2fe2566492b4630348325426a380b.py, line 1069, in render_field
  Module five.pt.expressions, line 161, in __call__
  Module Products.Five.browser.metaconfigure, line 485, in __call__
  Module zope.browserpage.viewpagetemplatefile, line 81, in __call__
  Module zope.browserpage.viewpagetemplatefile, line 49, in __call__
  Module zope.pagetemplate.pagetemplate, line 137, in pt_render
  Module five.pt.engine, line 98, in __call__
  Module z3c.pt.pagetemplate, line 163, in render
  Module chameleon.zpt.template, line 261, in render
  Module chameleon.template, line 191, in render
  Module chameleon.template, line 171, in render
  Module 44cc946657eb21d583772eddbcb3f200.py, line 610, in render
  Module 44cc946657eb21d583772eddbcb3f200.py, line 481, in render_widget_wrapper
  Module five.pt.expressions, line 161, in __call__
  Module plone.app.z3cform.widget, line 694, in render
  Module plone.app.widgets.base, line 334, in __init__
  Module plone.app.widgets.base, line 348, in _set_value
  Module lxml.etree, line 1020, in lxml.etree._Element.text.__set__
  Module lxml.etree, line 711, in lxml.etree._setNodeText
  Module lxml.etree, line 699, in lxml.etree._createTextNode
  Module lxml.etree, line 1439, in lxml.etree._utf8
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

 - Expression: "widget/@@ploneform-render-widget"
 - Filename:   ... rm-3.0.8-py2.7.egg/plone/app/z3cform/templates/macros.pt
 - Location:   (line 100: col 81)
 - Source:     ... place="structure widget/@@ploneform-render-widget"/>
									^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 - Expression: "widget/render"
 - Filename:   ... rm-3.0.8-py2.7.egg/plone/app/z3cform/templates/widget.pt
 - Location:   (line 39: col 46)
 - Source:     ... xt" tal:replace="structure widget/render"
											  ^^^^^^^^^^^^^
 - Arguments:  repeat: {...} (0)
			   context: <RichTextWidget xtext at 0x1115f6bd0>
			   views: <ViewMapper - at 0x10698ecd0>
			   modules: <TraversableModuleImporter - at 0x1068ba110>
			   args: <tuple - at 0x104795050>
			   nothing: <NoneType - at 0x7fff988076a8>
			   target_language: <NoneType - at 0x7fff988076a8>
			   default: <object - at 0x1047ebae0>
			   request: <instance - at 0x1104c8e18>
			   wrapped_repeat: {...} (0)
			   loop: {...} (0)
			   template: <ViewPageTemplateFile - at 0x10c82de90>
			   translate: <function translate at 0x10694d5f0>
			   options: {...} (0)
			   view: <RenderWidget ploneform-render-widget at 0x10698e710>

PS: xtext is just the name of the field (changed it from 'text' to see if it matters…

How to reproduce with a stock Plone 5.2/Python 3 installation?

It is likely that the traceback part quoted above is the relevant one. It tells us that the _set_value in line 348 of plone.app.widgets.base tries to construct an XML text node and provides the wrong type. I would accept either "unicode" or pure ASCII but likely gets an utf-8 encoded binary string -- maybe, something which has been such an object and gots wrongly converted to unicode (e.g. during Python 2 -> Python 3 migration of existing content) and thus contains control characters.

My favorit way to analyse exceptions is via Products.PDBDebugMode. This product is not yet Python 3/Zope 4 compatible - but I have a locally hacked version which is. Let me know whether you are interested in this version.

I think 'something has happened' with 'block'-related add-ons, or maybe plone.app.contenttypes (?)

It looks like I run into the same errors on 5.1.5 by doing.

  1. install a brand new site

  2. install Mosaic 2.2.1

  3. Install collective.themingfragments 2.11.1

  4. Duplicate bareceloneta theme and add a 'fragments' folder with a file: fragment.xml with one field

    <model xmlns="http://namespaces.plone.org/supermodel/schema"
    xmlns:form="http://namespaces.plone.org/supermodel/form"
    xmlns:security="http://namespaces.plone.org/supermodel/security">
    <schema>
    <field name="mayfield" type="zope.schema.TextLine"></field>
    <schema>
    <model>

and a fragment.pt file with

<div>XXX</div>
  1. Set view to Mosaic
  2. Add the fragment, add text to the field
  3. Save

(it works)

  1. Edit the page, change the fragment field to 'ÆØÅöä'

Can anyone confirm this ?

Another setup with Mosaic = 2.10 (and the suggested pins for blocks etc ) and collective.themefragments = 2.11.1 suggested for that versions) works

UPDATE: In this case, it looks like themingfragments constructs a really weird url

'http://localhost:8080/Plone/front-page/@@collective.themefragments.fragment/b8782a5c84f54817a54d76b5dcdaaaaa?fragment=testfragment&fragment=testfragment&bannertext=This+is+a+banner\xc3\x86\xc3\x98\xc3\x85\xe2\x80\xa6'

First of all, it looks rather odd that fragment=testfragment is repeated twice, and yes: it is a string, so the line

response = subrequest(url, exception_handler=subresponse_exception_handler)

gives error:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 175-178: ordinal not in range(128)

So I dont think the title on this tread is correct, it is not the rich text widget that is the problem, but something else…

You know that Plone 5.1 is not yet Python 3 compatible. This implies that you see the error (also) with Python 2 and Zope2 2.x. For them, you could use the stock Products.PDBDebugMode to analyse precisely what goes wrong (likely, a "decode" is missing which would convert an UTF-8 encoded str into unicode .

I have meanwhile noticed, that the collective on github has a new version of Products.PDBDebugMode allegedly compatible with Zope 4 (and Python 3).

Yes. I got my errors when trying to port to 5.2 & Python 3 (so I thought they were introduced there).

Since I continued fixing some things at home, I discovered that I got errors in Plone 5.1.5 too.

Finally, I have discovered that the problems comes from upgrading plone.subrequest, which is pinned to 1.8.6 (plone 5.1.5) and 1.9.0 (Plone 5.2)

OK: plone.subrequest = 1.8.5
Not OK: plone.subrequest >= 1.8.6

So it turns out that Plone 5.2, python 3 is not the problem.
Now I just needs to figure out what happen form 1.8.5 to 1.8.6

PS: Thanks a lot for you time


Update: plane.subrequest thinks the url is 'six.binarytype) and tries to decode it:
https://github.com/plone/plone.subrequest/blob/master/plone/subrequest/\_\_init__.py#L81

Second update: Turns out this was not related, it came from another field on the same schema and is related to Mosaic. (in other words: fragments from Mosaic can not contain 'international characters' unless you use plone.subrequest <= 1.8.5

After some digging I have found where the error comes from

The field is defined as:

  <field name="text" type="plone.app.textfield.RichText" marshal:primary="true"
      form:widget="plone.app.z3cform.widget.RichTextFieldWidget">
  <title>Text</title>
  </field>

Everything works until I add the norwegian character 'Å' ( &Aring; )

In plone.jsjonserilizer the conversion is like this:

    def richtext_converter(value, schema):
    encoding = value.get('encoding', u'utf-8')\
                    .encode('utf-8', 'ignore')
    raw = value.get('data', '').encode(encoding)
    mimeType = value.get('content-type', u'text/html')\
                    .encode('utf-8', 'ignore')
    outputMimeType = value.get('output-content-type', u'text/x-html-safe')\
                          .encode('utf-8', 'ignore')
    return RichTextValue(
        raw=raw,
        mimeType=mimeType,
        outputMimeType=outputMimeType,
        encoding=encoding
    )

If I check:

raw = value.get('data', '')

I get

u'<p>\xc5</p>' 

(google seems to think this is Å)

If I check:

value.get('data', '').encode(encoding)  

I get;

u'<p>\xc3\x85</p>'

Which also seems to be Å
(encoding='utf-8')

If I change the line (se below) to

  raw = value.get('data', '')

It 'work OK' (I can save, edit & view the field (widget))

Possibly there should be test whether value.get('data', '') returns bytes or str (unicode)... :thinking:

Can it be fixed like this:

…or could this be done in the RichTextValue itself?

@espenmn This is now fixed in Plone 6.0.5.

See Remove invalid "unicode control characters" for `TextareaWidget` value by petschki · Pull Request #167 · plone/plone.app.z3cform · GitHub