Collectionfilter unicode error

I have a 5.1.5 site with c.collectionfilter that sorts on 'contributors'. I have a problem with one of the names – his middle name contains the norwegian character 'ø' - so for now, I have just 'dropped the middle name'.

Does anyone have code (for the indexer ?) that works with c.collectionfilter ?

If not: where would be the right place

    Traceback (innermost last):
      Module ZPublisher.Publish, line 138, in publish
      Module ZPublisher.mapply, line 77, in mapply
      Module ZPublisher.Publish, line 48, in call_object
      Module plone.app.portlets.browser.utils, line 36, in render_portlet
      Module Products.Five.browser.pagetemplatefile, line 125, in __call__
      Module Products.Five.browser.pagetemplatefile, line 59, in __call__
      Module zope.pagetemplate.pagetemplate, line 137, in pt_render
      Module five.pt.engine, line 98, in __call__
      Module z3c.pt.pagetemplate, line 163, in render
      Module chameleon.zpt.template, line 261, in render
      Module chameleon.template, line 191, in render
      Module chameleon.template, line 171, in render
      Module d30a76ea83cbbbdeb8164096b2616e85.py, line 218, in render
      Module five.pt.expressions, line 161, in __call__
      Module collective.collectionfilter.baseviews, line 130, in results
      Module plone.memoize.volatile, line 68, in replacement
      Module collective.collectionfilter.filteritems, line 114, in get_filter_items
      Module plone.app.contenttypes.behaviors.collection, line 121, in results
      Module plone.app.querystring.querybuilder, line 98, in __call__
      Module plone.app.querystring.querybuilder, line 170, in _makequery
      Module Products.CMFPlone.CatalogTool, line 526, in searchResults
      Module Products.ZCatalog.ZCatalog, line 604, in searchResults
      Module Products.ZCatalog.Catalog, line 1072, in searchResults
      Module Products.ZCatalog.Catalog, line 549, in search
      Module Products.PluginIndexes.common.UnIndex, line 426, in _apply_index
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

     - Stream:     Heine ø
                          ^
     - Expression: "view/results"
     - Filename:   ... collective/collectionfilter/portlets/collectionfilter.pt
     - Location:   (line 9: col 64)
     - Source:     ... ntent" tal:define="results view/results">
                                                  ^^^^^^^^^^^^
     - Arguments:  repeat: {...} (0)
                   template: <ViewPageTemplateFile - at 0x7f981341c590>
                   views: <ViewMapper - at 0x7f98132306d0>
                   modules: <instance - at 0x7f982df75b00>
                   args: <tuple - at 0x7f983d287050>
                   here: <ImplicitAcquisitionWrapper boker at 0x7f9813232dc0>
                   user: <ImplicitAcquisitionWrapper - at 0x7f9813232f50>
                   nothing: <NoneType - at 0x8f5320>
                   container: <ImplicitAcquisitionWrapper boker at 0x7f9813232dc0>
                   input_type: checkbox
                   request: <instance - at 0x7f98132185a8>
                   wrapped_repeat: <SafeMapping - at 0x7f98134f3578>
                   traverse_subpath: <list - at 0x7f9813168f80>
                   default: <object - at 0x7f983d1a4540>
                   loop: {...} (0)
                   context: <ImplicitAcquisitionWrapper boker at 0x7f9813232dc0>
                   view: <Renderer - at 0x7f98132cc210>
                   translate: <function translate at 0x7f98135d6398>
                   root: <ImplicitAcquisitionWrapper Zope at 0x7f98139fb3c0>
                   options: {...} (0)
                   target_language: <NoneType - at 0x8f5320>
    2022-09-05 16:18:45 ERROR Zope.SiteErrorLog 1662387525.370.630754071238 http://www.marfag.no/boker/summary_view
    Traceback (innermost last):
      Module ZPublisher.Publish, line 138, in publish
      Module ZPublisher.mapply, line 77, in mapply
      Module ZPublisher.Publish, line 48, in call_object
      Module five.customerize.zpt, line 83, in __call__
      Module Products.PageTemplates.ZopePageTemplate, line 338, in _exec
      Module Products.PageTemplates.ZopePageTemplate, line 435, in pt_render
      Module Products.PageTemplates.PageTemplate, line 87, in pt_render
      Module zope.pagetemplate.pagetemplate, line 137, in pt_render
      Module five.pt.engine, line 98, in __call__
      Module z3c.pt.pagetemplate, line 163, in render
      Module chameleon.zpt.template, line 261, in render
      Module chameleon.template, line 191, in render
      Module chameleon.template, line 171, in render
      Module adf7b50ccdae697b764f8e8d40d45eea.py, line 527, in render
      Module f4e43ab743022a4ffdcd5ad92eeada24.py, line 929, in render_master
      Module f4e43ab743022a4ffdcd5ad92eeada24.py, line 289, in render_content
      Module adf7b50ccdae697b764f8e8d40d45eea.py, line 515, in __fill_content_core
      Module 1f559a2a75dd873e80391f4ca861aba8.py, line 538, in render_content_core
      Module 1f559a2a75dd873e80391f4ca861aba8.py, line 265, in render_listing
      Module five.pt.expressions, line 161, in __call__
      Module plone.app.contenttypes.browser.collection, line 46, in batch
      Module plone.app.contenttypes.browser.collection, line 41, in results
      Module plone.app.contenttypes.behaviors.collection, line 121, in results
      Module plone.app.querystring.querybuilder, line 98, in __call__
      Module plone.app.querystring.querybuilder, line 170, in _makequery
      Module Products.CMFPlone.CatalogTool, line 526, in searchResults
      Module Products.ZCatalog.ZCatalog, line 604, in searchResults
      Module Products.ZCatalog.Catalog, line 1072, in searchResults
      Module Products.ZCatalog.Catalog, line 549, in search
      Module Products.PluginIndexes.common.UnIndex, line 426, in _apply_index
    UnicodeDecodeError: getField

     - Stream:     Heine ø
                          ^
     - Expression: "view/batch"
     - Filename:   ... .egg/plone/app/contenttypes/browser/templates/listing.pt
     - Location:   (line 22: col 31)
     - Source:     <tal:results define="batch view/batch;
                                              ^^^^^^^^^^
     - Expression: "nocall:context/getField"
     - Filename:   <string>
     - Location:   (line 0: col 0)
     - Expression: "nocall:context/getField"
     - Filename:   <string>
     - Location:   (line 0: col 0)
     - Expression: "nocall:context/getField"
     - Filename:   <string>
     - Location:   (line 0: col 0)
     - Expression: "nocall:context/getField"
     - Filename:   <string>
     - Location:   (line 0: col 0)

There was a bugfix in collective.collectionfilter which may be relevant to your use case: Fix bug where filter urls was getting utf encoded then made into unicode again [djay]
Described here: collective.collectionfilter/CHANGES.rst at main · collective/collective.collectionfilter · GitHub

I see that I have 3.2.1 installed, but I tested 3.5 and 3.1 and both produced the same error.

Maybe the solution could be to 'use unicode instead of ascii' for the contributors index ?

If the list of contributors is not giving you issues in other situations (e.g. tags on the document itself), then I would look at the values the querybuilder is passing to the search catalog. This will help you find out where the error is taking place.

Have you tried upgrading to Plone 5.1-latest :wink: These types of corner cases have a tendency to magically disappear in later versions.

It it on a live site, so I dont really want to change too much unless I have to, so I will need to test it on a copy ( or wait until the teachers of that school go on strike, probably in a few days :slight_smile: )

Update: I did a check and see that the Subject index works without unicode error. A bit strange, maybe, since they both use Keyword index (and I though both are unicode indexes)

Update 2: This is even stranger (?). I removed the Subject index, and indexed another field with the Subject index, and c.collectionfilter does still show the 'the (right) Subject values in the portlet. (It is not a caching issue, I made a new ).

To me, the value that is passed for "Subject" and "contributors" is exactly the same ( both show 'query': '\xc3\x86\xc3\x98\xc3\x85', but only 'Subect' works, so I have a feeling something else is going on)

I just did an install of collection filter 4.0 on Plone 5.2.6. Now, the unicode errors are gone.

I could not find anything in any change log which makes me understand why.

Does anyone have an idea about why it now works now?

(PS: I have some dependencies which prevents me from upgrading the site)

Which python version? In py2 the Subject index was an utf-8 encoded string, that's why there were always problems when searching by that index. Since py3 everything is unicode and these problems should be gone.

Actually, the Subject works 'always', but any other field that uses the same index type (Keyword Index) gives unicode error.

PS: Both are tested with Python 2.7

Then your indexing method should encode everything the same like the Subject indexer does: plone.dexterity/content.py at 2.6.x · plone/plone.dexterity · GitHub

I suppose that is the way to go, but it gives me unicode error too ( UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) )

UPDATE; Wild guess, but could it be that 'somewhere else', something is trying to convert to unicode something that is already unicode. I did a check, and content of 'creators' is already unicode.

UPDATE2: I installed Plone 5.2.x and collectionfilter 3.5.2 and it does not work (in fact, no errors are show, but the result is not filtered).
Then I updated some (py) files with those from the latest 4 branch, and it worked. Then I used the same setup on a 5.1.5 site, and it did not work (same unicode errors as before).

UPDATE 3 (december 5, 2022): By 'accident', I discovered that collection filter has its own indexer, which does:

    @indexer(IDexterityContent)
    def subject_indexer(obj):
        """Subject indexer. Returns EMPTY_MARKER, if no subjects are set.
        """
        cat = ICategorization(obj, None)
        cats = getattr(cat, 'subjects', None) or (EMPTY_MARKER, )
        cats = tuple([it.encode('utf-8', 'replace') for it in cats])
        return cats