[SOLVED] A different document with value '...' already exists in the index

A different document with value '...' already exists in the index.

...and after this warning / error I have a broken brain in catalog that won't update no matter what I do to the object (my issue is with the review_state which remained in pending and now the object is published but not appearing in listings).

Tried also to manually uncatalog the object via ZMI and debugger, reindex, etc. No luck.

Removing the object and manually uncatalog it may help, but that's not an option :slight_smile:

I wonder what could led to duplicate UIDs in portal_catalog and if there are any tools to clean or debug large catalogs without having to rebuild the entire catalog?

I have seen this before, and I think a clear and rebuild of the catalog helped to fix it.

I think we had some similar situation in Plone 4 with some discussion items.

I would say we fixed them clearing and rebuilding the catalog, but no idea why this happened.

Alin Voinea via Plone Community wrote at 2022-2-15 12:00 +0000:

...
I wonder what could led to duplicate UIDs in portal_catalog and if there are any tools to clean or debug large catalogs without having to rebuild the entire catalog?

There are some situations when an error during index operations
does not lead to a transaction abort. In those cases, the catalog
(and its indexes) can become inconsistent.

It might (or not) be sufficient to reindex the index that reported the problem.

1 Like

True, it would help if the index name would be in the error log.

Alin Voinea via Plone Community wrote at 2022-2-15 18:15 +0000:

True, it would help if the index name would be in the error log.

Only very few indexes (like only the "uid_index") check uniqueness
of index values.

1 Like

True. Still, reindexing UID didn't do the trick. Maybe I should clean it first :slight_smile:

Alin Voinea via Plone Community wrote at 2022-2-16 09:54 +0000:

True. Still, reindexing UID didn't do the trick. Maybe I should clean it first :slight_smile:

I think reindexing means "clear" then "reindex".
But, you can try explicit clearing anyway.

We have seen this issue on a portal where sub sites were exported using zexp and afterwards re-imported again at a different location/folder in the portal. (situation was bit more complicated, for brevity). Then you can get duplicate UID's.

@mauritsvanrees improved collective.catalogcleanup last year to also check for these UID & other problems.

But now that you mention discussion items: I have no idea... what happens if you copy/paste an CT item that has discussion items 'attached'? Are these discussion items dropped? Do they have UID's? Are these recalculated?

1 Like

Thank you. I will give it a try to collective.catalogcleanup :wink:

collective.catalogcleanup didn't help.

It worked by:

  • clearing review_state and path indexes and re-index them via ZMI. Then I re-published the object.
  • Number of indexed objects: ~78k Duration: ~4min
  • Drawbacks:
    • Listings with filters based on review_state or path will be empty during reindex.
    • You may have to tweak Subtransaction threshold via ZMI > portal_catalog > Advanced if you have a site with lot of editors or even close it for authenticated users during this operation. Thus. never apply directly on production :smiley: . Always try it first on staging.
    • I don't think the issue is completely fixed as some other indexes may still have the wrong data (that's why I also reindexed the path index as the broken brain were still visible in ZMI when filtering by path).
[waitress-3] 70000/77540 (90.28%) Estimated termination: 2022/02/19 12:17:29h                                                                                                            
[waitress-3] committing subtransaction                                                                                                                                                   
[waitress-3] Process terminated. Duration: 213.78 seconds   

This script may find and fix some inconsistencies in the UID index, plus possibly the intids catalog: