TypeError in ZCTextIndex after upgrade to Py3

I upgraded my app to Py3 two days ago, and today I received the following error:

2020-07-27 07:40:49,668 - [ERROR] - Traceback (most recent call last):
  File "/srv/bliss/deployment/work/source/bliss.git/src/Products/BlissSupportMailer/BlissSupportMailer.py", line 212, in _import_support_message
    parser_response = support_email.parseFromString(message_data)
  File "/srv/bliss/deployment/work/source/bliss.git/src/Products/BlissSupport/SupportEmail.py", line 447, in parseFromString
    parent_message = self._connect_to_parent_message(message)
  File "/srv/bliss/deployment/work/source/bliss.git/src/Products/BlissSupport/SupportEmail.py", line 117, in _connect_to_parent_message
    results = mm.subClassCatalog('AbstractSupportEmail').searchResults(msgId=in_reply_to_message_id)
  File "/srv/bliss/deployment/work/source/bliss.git/src/Products/SubClassCatalog/SubClassCatalog.py", line 83, in searchResults
    results.extend(catalog.searchResults(query))
  File "/srv/bliss/.batou-shared-eggs/Products.ZCatalog-5.0.4-py3.7.egg/Products/ZCatalog/ZCatalog.py", line 611, in searchResults
    return self._catalog.searchResults(query, **kw)
  File "/srv/bliss/.batou-shared-eggs/Products.ZCatalog-5.0.4-py3.7.egg/Products/ZCatalog/Catalog.py", line 1091, in searchResults
    return self.search(query, sort_indexes, reverse, sort_limit, _merge)
  File "/srv/bliss/.batou-shared-eggs/Products.ZCatalog-5.0.4-py3.7.egg/Products/ZCatalog/Catalog.py", line 634, in search
    rs = self._search_index(cr, index_id, query, rs)
  File "/srv/bliss/.batou-shared-eggs/Products.ZCatalog-5.0.4-py3.7.egg/Products/ZCatalog/Catalog.py", line 564, in _search_index
    index_rs = index.query_index(index_query, rs)
  File "/srv/bliss/.batou-shared-eggs/Products.ZCatalog-5.0.4-py3.7.egg/Products/ZCTextIndex/ZCTextIndex.py", line 210, in query_index
    results = tree.executeQuery(self.index)
  File "/srv/bliss/.batou-shared-eggs/Products.ZCatalog-5.0.4-py3.7.egg/Products/ZCTextIndex/ParseTree.py", line 132, in executeQuery
    return index.search_phrase(self.getValue())
  File "/srv/bliss/.batou-shared-eggs/Products.ZCatalog-5.0.4-py3.7.egg/Products/ZCTextIndex/BaseIndex.py", line 217, in search_phrase
    if docwords.find(code) >= 0:
TypeError: argument should be integer or bytes-like object, not 'str'
 (server time: 20200727T07:40:49)

Luckily, there is some hint in the Zope documentation:
https://zope.readthedocs.io/en/latest/migrations/zope4/zodb.html#going-from-python-2-to-python-3

If your application uses the ZCatalog and there are problems with any of them, do a clear and rebuild.

And actually, this solved my problem.

I am just curious - as I have hundreds of indexes.

Why was this one affected? What about my other indexes? Re-building all of them would take a lot of time.

Any hint why this was was broken?

The indexed information is the In-Reply-To value of an email message object.

As you know, Python 2 did not clearly distinguish between text and binary data. Most instances of str objects in Python 2 represent (encoded) text, but some may contain binary data.

The ZODB conversion from Python 2 to Python 3 tries to accommodate this with a heuristics: convert Python 2 str into Python 3 str, unless otherwise specified. This gives the wanted result in most cases. But, the ZCTextIndex uses actually binary data (for the "WID" (= "WordInDEX") encoded document content) and there the heuristics breaks (apparently, no one has already told zodbupdate about this exception).

1 Like

Thank you very much!

I am still puzzled why this problem did not show both in my Python 3 dev environment and the testing server, as both uses a copy of the production ZODB.

I just faced a similar error in a CMS product.

Search for "test" gave no error, but search for "192.168.11" returned the same error as above:

Module Products.ZCatalog.ZCatalog, line 611, in searchResults
Module Products.ZCatalog.Catalog, line 1091, in searchResults
Module Products.ZCatalog.Catalog, line 634, in search
Module Products.ZCatalog.Catalog, line 564, in _search_index
Module Products.ZCTextIndex.ZCTextIndex, line 210, in query_index
Module Products.ZCTextIndex.ParseTree, line 132, in executeQuery
Module Products.ZCTextIndex.BaseIndex, line 217, in search_phrase
TypeError: argument should be integer or bytes-like object, not 'str'

Now, there is only one option - I need to rebuild all my ZCTextIndexes.

I have 874 instances of ZCTextIndex - so this is not a task to do manually.

Rebuilding some of them will take up to 5-10 minutes each...

Any hint how to proceed?

Writing the script is the easy part... Is it feasible to run this in production? Or should I better shut down the server, and rebuild all indexes in a maintenance period?

Can't you just rebuild the whole Catalog? This should also a step do to anyway when migrating, at least manually.

1 Like

The "WID encoded document content" is used only for so called "phrase searches". Therefore, simple searches work and the error manifests itself only with the first phrase search.

1 Like

Thanks! That was the missing link.

I have one catalog per domain model = ca. 140 catalogs

I know this is not standard, but that's the way I inherited the app. And yes, this was problematic more than once, as quite some code paths always implicitly assume you have only one or at least few catalogs.

So, now I know this. Maybe this should be mentioned more prominent in the update guide (for Zope) - I don't know whether this appears in the Plone guide, or maybe it does gets handled there automatically.

maybe some hints for python3 migration: https://github.com/collective/collective.migrationhelpers/blob/master/src/collective/migrationhelpers/post_python3_fixes.py

we're using this package with great success in our migration projects

1 Like

Now, this makes a lot of sense. Would have been odd if I am the first one with such problems.

Thanks! I guess I can transfer the script to work in plain Zope.

II guess so too :wink:

I have fixed two catalogs manually by clearing them and rebuilding them.

But unlike the referenced script I did not touch a Lexicon.

It looks like that it works well, though.

Can you give me a hint why the script also empties the Lexicons?

Maybe you could also give me an one liner how Lexicons and Catalogs are connected.

Thanks a lot!

The lexicon is used by text indexes (not directly be the catalog). Conceptionally, it encapsulates the linguistic competence (things like word separators, normalization, stemming, ...). It maps words to (small) integers which typically can be stored more efficiently than the original words.

I guess here: locking through log entries for a large Plone migration (4.2 -> 5.2), I have observed systematic reindexation of the text indexes. I got the (maybe wrong) impression that the linguistics was significantly changed. In such a case, you may want to clear the lexicon to get rid of no longer used words -> wid mappings.

1 Like