Plone.app.z3cform SelectFieldWidget - duplicate html & entity in values

While using SelectFieldWidget, a value containing an ampersand is converted to:

Plone & Python

in the select2 <select> element of my field, displaying as "Plone &amp; Python"

The value is displayed correctly when using an AjaxSelectFieldWidget (which I use elsewhere but not appropriate in this situation).

I checked in plone.app.z3cform.widgets.SelectWidget makeItem() and the item['content'] value which gets passed to widget.pt is "Plone & Python"

After some hours of diving down the template rabbitholes in plone.app.z3cform plone.z3cform, z3c.form and plone.app.widgets I got thoroughly confused and not any closer to solving my issue.

Can any of you provide some insights?

I looks as if the value (Plone & Python) has been xml quoted twice. The first quoting results in (Plone &amp; Python), the second in (Plone &amp;amp; Python). One quoting is normal and performed (by default) by the templating logic (as a protection against cross site scripting attacks): Plone &amp; Python is the correct HTML representation for Plone & Python. You would need to find out where the second quoting comes from.

Yes that was my conclusion too: that somewhere there is a second quoting going on which should not be there, I then looked at whether the issue comes from my custom vocabulary or further down in the chain.

In the plone.app.z3cform.widget SelectWidget class, there is a function makeItem() which creates the value/label pair. It does not have any quoting yet.

The tal template uses widget/render to create the field - knowing where to find this render function would help a lot. I have a hunch that there is a missing "structure" keyword in the tal element which provides the content for the html select element...

I then looked at the SelectWidget class in z3c.form.browser.select here it is addItem() which creates the list of value/label pairs - all good there too.

Eventually, the field is rendered by the render function in BaseWidget. It returns pattern_widget.render() which correctly outputs:

<select class="pat-select2 select-widget set-field" multiple="multiple" name="form.widgets.art_list" 
               data-pat-select2="{&quot;allowClear&quot;: true, &quot;multiple&quot;: true, &quot;separator&quot;: &quot;;&quot;}">
  <option value="00018">00018 Holz H&#246;sl Bambuspflege-&#214;l W</option>
  <option value="00055">00055 Holzland Peter &amp; Sohn Gr&#252;nbelagentferner</option>
</select>

How does &amp; get quoted twice, but not the other escaped characters?

Have you observed this? In the HTML code present, &, too, is correctly quoted.

If you really observe that only & is wrongly handled, then maybe the "output transform chain" introduced the error. The "output transform chain" is triggered by the "request finished event" (i.e. ZPublisher.interfaces.IPubEnd). If you call the view in an interactive Python interpreter (--> bin/{client1|instance} debug) and everything is correct but the browser presented representation is wrong, then there is some chance that the "output transform chain" is to blame. To verify, you would put the view result into the response object, "notify" this event and compare the transformation result (in the response object) with the view result.

This is the field using AjaxSelectWidget:

ajaxselectwidget-noamp

This is the field using SelectWidget:

selectwidget-amp

With AJAX, you often do not deliver HTML but raw data - and raw data does not need XML/HTML quoting. Also the "output transform chain" is (usually) not applied to AJAX responses (unless the response is of type text/xml or text/html).

I put a pdb trace in main_template:

(Pdb) print econtext['view']
<Products.Five.metaclass.EasyFormFormWrapper object at 0x7f8644a6ea90>
(Pdb) print econtext['view'].render()

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" lang="de">
...

Everything still looks as it should, meaning that the SelectWidgets field is populated with correct value/label pairs. In the browser, the additional &amp; is still displayed.

I am not familiar enough with debug mode to know how to proceed. In pdb I can do this:

(Pdb) econtext['context'].REQUEST.RESPONSE.write(econtext['view'].render())
*** TypeError: Value must be a string
(Pdb) econtext['context'].REQUEST.RESPONSE.write(econtext['view'].render().encode('utf-8', 'xmlcharref'))

The second command renders the usual incorrect values (obviously).

For now, I settled on a workaround by avoiding ampersands in my vocabulary. Thanks @dieter for your help so far, would be interested in revisiting this as time permits...

from plone.i18n.normalizer.interfaces import IFileNameNormalizer
from Products.CMFPlone.interfaces import ILanguage

label_normalizer = getUtility(IFileNameNormalizer)
target_language = ILanguage(context).get_language()

label = label_normalizer.normalize(u'%s %s' % (article_number, product_title), locale=target_language)

I am using a custom mapping for plone.i18n.normalizer.de.py which includes 38 : 'und',

Resulting in

selectwidget-workaround