[SOLVED] IndexError: list index out of range Products.ZCatalog.Catalog

ZCatalog - 2.13.29 (Plone 4)

This bug emerged as a "When I search for 'an object' Plone gives me an error". The traceback indicates an index error at the line shown above:
max = float(rs[0][0])

I have discovered this is a search in which the search string is so specific that the final result, after searching all indexes, is a single object.

This annoyingly happens when using a reference browser widget and knowing exactly what you want to find. Mostly by cutting-and-pasting a url into the search which pretty much guarantees a single result.

When:

  1. No sort index is provided to the query
  2. The 'SearchableText' index is used.
  3. OkapiIndex supplies a NEGATIVE WEIGHT to all the objects returned

Line 666 will toss out all results with a negative weight
Line 667 Assumes there is at least one result - and throws the exception.

As I have dug deeper into why you can have a negative weight, I found myself spelunking into the OkapiIndex implementation.

I have come out of that finding that the "total # of words in all docs" ( self._totaldoclen) is negative.

If I manually ask for the total doc length with a pdb() inside OkapiIndex._search_wids -
sum(self._docweight.values())
It will return a positive, and sane, value.

This bug happens regardless of C-Optimization while determining the weights. At this time, I think that the okascore.score() algorithm is not the culprit, but I have not eliminated it entirely - especially if it updates the index using OkapiIndex._change_doc_len(delta)

Anyone with experience inside the OkapiIndex or has seen negative scores in SearchableText indexes - any experience or advice would be appreciated.

I'm posting here before opening a bug in ZCTextIndex on Github - because it's the 'right thing to do'

I'll update the thread if I find the fix or root cause. Currently, the assumption is that we did something goofy, because that's always the most likely cause.

./bin/instance debug

>>> idxs = app.Plone.portal_catalog.Indexes
>>> idx = idxs['SearchableText']
>>> idx._index_type

'Okapi BM25 Rank'

>>> idx.index._totaldoclen.value

-19826426L

Is this weird? I think it's weird. Maybe it's normal. I don't think it's normal, but I don't know.

>>> idx.index._docweight

<BTrees.IOBTree.IOBTree object at 0x7feb29000000>

>>> sum(idx.index._docweight.values())
3324419

Comments say that this is really the correct # for total doc length. Takes about 500ms to compute on my development box.

Just reindexed the SearchableText index and tried again:

>>> idx.index._totaldoclen.value

-23150845L

That's pretty cool.

The sum of all docweights remained the same: 3324419

Got it.

But since we still use the DEPRICATED Products.ZCTextIndex we never got the 2.13.3 update. (we are at 2.13.2)

And it seems that this is now maintained in Products.ZCatalog anyway.

Thanks for taking this journey with me. I'm going to talk with our sysadmin about upgrade paths.