[SOLVED] How to search for keywords that have 'unicode/special' characters?

How should one search for keywords that has unicode characters ?

I get the keyword from:

   <field name="keyword" type="zope.schema.Choice">
   <vocabulary>plone.app.vocabularies.Keywords</vocabulary>

And search (for) them by:

items = self.context.portal_catalog(Subject=keyword, Language=language)

This works, but when I add a keyword "Sjåfør', I get an error, even if I search for another. I tried a few approaches, like searching in TAL instead, but the errors are 'variations of'.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

 - Stream:     sjåfør
                 ^
 - Expression: "python:context.portal_catalog(Subject=keyword, Language=language)"
 - Filename:   features
 - Location:   (line 16: col 26)
 - Arguments:  repeat: {...} (0)
               template: <ImplicitAcquisitionWrapper features at 0x7f080968e8c0>
               modules: <_SecureModuleImporter - at 0x7f0813c9cd10>
               here: <ImplicitAcquisitionWrapper skoler at 0x7f080980c320>
               wrapped_repeat: <SafeMapping - at 0x7f0802877518>
               portal: <ImplicitAcquisitionWrapper russ at 0x7f080968ef00>
               user: <ImplicitAcquisitionWrapper - at 0x7f0809842640>
               nothing: <NoneType - at 0x8f5320>
               target_language: <NoneType - at 0x8f5320>
               family_css: fa
               container: <ImplicitAcquisitionWrapper skoler at 0x7f080980c320>
               keyword: skole
               language: no
               title: <NoneType - at 0x8f5320>
               request: <WSGIRequest - at 0x7f0802949dd0>
               portal_url: http://russ.medialog.no
               default: <object - at 0x7f0822cf2770>
               css_file: features
               loop: {...} (0)
               context: <ImplicitAcquisitionWrapper skoler at 0x7f080980c320>
               translate: <function translate at 0x7f08028065f0>
               root: <ImplicitAcquisitionWrapper  at 0x7f080a7c9500>
               options: {...} (1)
               view: <FragmentView features at 0x7f080813acd0>
/home/medialog/vol2/instance8088/buildout-cache/eggs/z3c.form-3.7.0-py2.7.egg/z3c/form/util.py:180: UnicodeWarning: Unicode unequal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if (not dm.canAccess() or dm.query() != value):
2020-09-01 15:52:29,675 WARNING [plone.jsonserializer:93][waitress] Constraint not satisfied for value "sjåfør" of field "keyword". Returning None instead.
2020-09-01 15:52:29,675 WARNING [plone.jsonserializer:49][waitress] Deserializer not found for value "sjåfør" of field "keyword". Returning None instead.
2020-09-01 15:52:32,753 WARNING [plone.jsonserializer:93][waitress] Constraint not satisfied for value "sjåfør" of field "keyword". Returning None instead.
2020-09-01 15:52:32,754 WARNING [plone.jsonserializer:49][waitress] Deserializer not found for value "sjåfør" of field "keyword". Returning None instead.
2020-09-01 15:52:39,427 WARNING [plone.jsonserializer:93][waitress] Constraint not satisfied for value "sjåfør" of field "keyword". Returning None instead.
2020-09-01 15:52:39,428 WARNING [plone.jsonserializer:49][waitress] Deserializer not found for value "sjåfør" of field "keyword". Returning None instead.
2020-09-01 15:52:39,540 WARNING [plone.jsonserializer:93][waitress] Constraint not satisfied for value "sjåfør" of field "keyword". Returning None instead.
2020-09-01 15:52:39,540 WARNING [plone.jsonserializer:49][waitress] Deserializer not found for value "sjåfør" of field "keyword". Returning None instead.

The traceback appears incomplete

My latest try (with TAL) gives this

(where 'skole' is the keyword I search for, while 'sjåfør' is one that exists but are not search for)

<div tal:define="items python:context.portal_catalog(Subject='%s' % keyword, Language='%s' % language)">

  2020-09-01 17:20:10,835 ERROR   [Zope.SiteErrorLog:251][waitress] 1598973610.830.186061447556 http://xxxxxx.medialog.no/skoler/@@collective.themefragments.fragment/cb00ee56a0114390a24ac62431ae2abe
  Traceback (innermost last):
    Module ZPublisher.WSGIPublisher, line 156, in transaction_pubevents
    Module ZPublisher.WSGIPublisher, line 338, in publish_module
    Module ZPublisher.WSGIPublisher, line 256, in publish
    Module ZPublisher.mapply, line 85, in mapply
    Module ZPublisher.WSGIPublisher, line 62, in call_object
    Module collective.themefragments.tiles, line 225, in __call__
    Module collective.themefragments.traversal, line 173, in __call__
    Module Products.PageTemplates.ZopePageTemplate, line 284, in _exec
    Module Products.PageTemplates.ZopePageTemplate, line 371, in pt_render
    Module Products.PageTemplates.PageTemplate, line 85, in pt_render
    Module zope.pagetemplate.pagetemplate, line 135, in pt_render
    Module Products.PageTemplates.engine, line 88, in __call__
    Module z3c.pt.pagetemplate, line 173, in render
    Module chameleon.zpt.template, line 306, in render
    Module chameleon.template, line 209, in render
    Module chameleon.template, line 187, in render
    Module fa6908e37c0d9d8b9f80614acc16d8e8.py, line 330, in render
    Module Products.CMFPlone.CatalogTool, line 457, in searchResults
    Module Products.ZCatalog.ZCatalog, line 611, in searchResults
    Module Products.ZCatalog.Catalog, line 1091, in searchResults
    Module Products.ZCatalog.Catalog, line 634, in search
    Module Products.ZCatalog.Catalog, line 564, in _search_index
    Module Products.PluginIndexes.unindex, line 577, in query_index
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

   - Stream:     sjåfør
                   ^
   - Expression: "python:context.portal_catalog(Subject='%s' % keyword, Language='%s' % language)"
   - Filename:   features
   - Location:   (line 16: col 26)
   - Arguments:  repeat: {...} (0)
                 template: <ImplicitAcquisitionWrapper features at 0x7fcad4031e60>
                 modules: <_SecureModuleImporter - at 0x7fcade621d10>
                 here: <ImplicitAcquisitionWrapper skoler at 0x7fcad4113960>
                 wrapped_repeat: <SafeMapping - at 0x7fcad5275878>
                 portal: <ImplicitAcquisitionWrapper xxxxxx at 0x7fcad4331c30>
                 user: <ImplicitAcquisitionWrapper - at 0x7fcad4031dc0>
                 nothing: <NoneType - at 0x8f5320>
                 target_language: <NoneType - at 0x8f5320>
                 family_css: fa
                 container: <ImplicitAcquisitionWrapper skoler at 0x7fcad4113960>
                 keyword: skole
                 language: no
                 title: <NoneType - at 0x8f5320>
                 request: <WSGIRequest - at 0x7fcad12a98d0>
                 portal_url: http://xxxxxx.medialog.no
                 default: <object - at 0x7fcaed677770>
                 css_file: features
                 loop: {...} (0)
                 context: <ImplicitAcquisitionWrapper skoler at 0x7fcad4113960>
                 translate: <function translate at 0x7fcad1562578>
                 root: <ImplicitAcquisitionWrapper  at 0x7fcad414e230>
                 options: {...} (1)
                 view: <FragmentView features at 0x7fcad1125210>

Searching in view.py

items = self.context.portal_catalog(Subject=keyword, Language=self.context.Language) 

gives

  2020-09-01 17:29:29,463 ERROR   [Zope.SiteErrorLog:251][waitress] 1598974169.460.0658761831569 http://xxxxxx.medialog.no/skoler/@@collective.themefragments.fragment/cb00ee56a0114390a24ac62431ae2abe
  Traceback (innermost last):
    Module ZPublisher.WSGIPublisher, line 156, in transaction_pubevents
    Module ZPublisher.WSGIPublisher, line 338, in publish_module
    Module ZPublisher.WSGIPublisher, line 256, in publish
    Module ZPublisher.mapply, line 85, in mapply
    Module ZPublisher.WSGIPublisher, line 62, in call_object
    Module collective.themefragments.tiles, line 225, in __call__
    Module collective.themefragments.traversal, line 173, in __call__
    Module Products.PageTemplates.ZopePageTemplate, line 284, in _exec
    Module Products.PageTemplates.ZopePageTemplate, line 371, in pt_render
    Module Products.PageTemplates.PageTemplate, line 85, in pt_render
    Module zope.pagetemplate.pagetemplate, line 135, in pt_render
    Module Products.PageTemplates.engine, line 88, in __call__
    Module z3c.pt.pagetemplate, line 173, in render
    Module chameleon.zpt.template, line 306, in render
    Module chameleon.template, line 209, in render
    Module chameleon.template, line 187, in render
    Module 08305fbaa5dc1e7ffa1f8422e4d30bcc.py, line 330, in render
    Module Products.PageTemplates.expression, line 81, in __call__
    Module Products.PageTemplates.Expressions, line 122, in render
    Module script, line 39, in get_items
    Module script, line 35, in get_items
    Module Products.CMFPlone.CatalogTool, line 457, in searchResults
    Module Products.ZCatalog.ZCatalog, line 611, in searchResults
    Module Products.ZCatalog.Catalog, line 1091, in searchResults
    Module Products.ZCatalog.Catalog, line 634, in search
    Module Products.ZCatalog.Catalog, line 564, in _search_index
    Module Products.PluginIndexes.unindex, line 577, in query_index
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

   - Stream:     sjåfør
                   ^
   - Expression: "view/get_items"
   - Filename:   features
   - Location:   (line 16: col 26)
   - Arguments:  repeat: {...} (0)
                 template: <ImplicitAcquisitionWrapper features at 0x7f565042bcd0>
                 modules: <_SecureModuleImporter - at 0x7f5654f9ad10>
                 here: <ImplicitAcquisitionWrapper skoler at 0x7f565039e2d0>
                 wrapped_repeat: <SafeMapping - at 0x7f564ee29fc8>
                 portal: <ImplicitAcquisitionWrapper xxxxxx at 0x7f565040c870>
                 user: <ImplicitAcquisitionWrapper - at 0x7f565118a460>
                 nothing: <NoneType - at 0x8f5320>
                 target_language: <NoneType - at 0x8f5320>
                 family_css: fa
                 container: <ImplicitAcquisitionWrapper skoler at 0x7f565039e2d0>
                 keyword: skole
                 language: no
                 title: <NoneType - at 0x8f5320>
                 request: <WSGIRequest - at 0x7f564eaf7a10>
                 portal_url: http://xxxxxx.medialog.no
                 default: <object - at 0x7f5663ff0770>
                 css_file: features
                 loop: {...} (0)
                 context: <ImplicitAcquisitionWrapper skoler at 0x7f565039e2d0>
                 translate: <function translate at 0x7f564bb80c80>
                 root: <ImplicitAcquisitionWrapper  at 0x7f565042baa0>
                 options: {...} (1)
                 view: <FragmentView features at 0x7f564eaf7d90>

Python 2 or 3? Are you sure that you are passing the query as unicode (Py 2) or str (Py 3) and not as UTF8-encoded string?

It is Python2.

I am not sure about unicode or UTF8. Since this is a themingfragment I don't have access to pdb.

What I find a bit strange is that it works perfectly OK if the catalog does not have any items tagged with 'å', but it does not if it has. Even if I dont query it for it. In my case, I query for 'skole' and it does not work since the catalog has one items tagged with 'sjåfør'.

Anyway...ensure that your query is unicode/str and not utf8 encoded. Perhaps there is also a mixture of unicode strings and utf8 strings mixed in the catalog or querying unicode query against utf8 encoded indexed data...

Update: All queries for keywords that were added before 'sjåfør' works, but not those added later.

Indeed, it looks like the keyword (Subject) index is plain text 'by default' , while schema.Choice with Keywords vocabulary stores UTF-8.

So something similar to this will work:

    def get_items(self):
        keyword = self.data['keyword'].encode('ascii','ignore')
        return self.context.portal_catalog(Subject=keyword )

I had the very same traceback after a py2 -> py3 database migration. In an upgrade step I iterated over all content and for each I did:

        for fieldname in ("subjects", "tags"):
            values = getattr(item, fieldname, None)
            if values:
                new_values = []
                for value in values:
                    if isinstance(value, bytes):
                        value = str(value, encoding="utf8")
                    new_values.append(value)
                setattr(item, fieldname, tuple(new_values))

(besides subjects I also have an additional tags field in this project)

I have a feeling that the reason is something related to changes in Plone to make it run on Python 3, too. I had the error in 5 and it looks a bit similar to other errors I have had (to to six / bytes / string )

No idea if it is related, but I also noticed:

 plone/jsonserializer/serializer/converters.py:37: DeprecationWarning: getSiteEncoding: `getSiteEncoding` is deprecated. Plone only supports UTF-8 currently. This method always returns "utf-8"

I have not tried too much, since this is a running site that got it first keyword with a 'norwegian character'.

Is there any way to check what "Subject" is stored as?