Image scale blobs no cleaned by zeopack after removing ScalesDict

Update:

I got almost rid of all my binary data in the ZODB by removing all references pointing to deleted objects.
I found references to deleted objects in:

  • RelationValue parent pointer
  • Relation catalog various attributes
  • IntIds of RelationValue's -> KeyReferenceToPersistent object attribute.

Key point here ist. I had big structures on the Website. Like nested Folder with 100's of objects, which remained in the DB, because the top object still had a reference stored. Which means means I got in the meanwhile rid of 100'000 of objects. NamedImage/FileChunk obviously used the moste space. But in terms of object count they were only the tip of the iceberg.

Remove/Unindex relations, which are basically broken:

        portal_catalog = self.portal.portal_catalog
        relation_catalog = getUtility(ICatalog)

        for token in relation_catalog.findRelationTokens():
            rel = relation_catalog.resolveRelationToken(token)
            if not rel.__parent__:
                relation_catalog.unindex(rel)
            elif not portal_catalog.unrestrictedSearchResults(UID=rel.__parent__.UID()):
                relation_catalog.unindex(rel)
            elif not rel.from_object:
                relation_catalog.unindex(rel)

Iterate over literally everything and remove references (RelationValue) if object (iterate over all fields) is no longer in UID Index.
I used this as single source of truth for objects which should stay in the DB.

from zope.component.hooks import setSite
from plone.uuid.interfaces import IUUID
from plone.dexterity.interfaces import IDexterityContent
from plone.dexterity.utils import iterSchemata
from zope.schema import getFieldsInOrder
from z3c.relationfield.interfaces import IRelation
from z3c.relationfield.interfaces import IRelationChoice
from z3c.relationfield.interfaces import IRelationList
from z3c.relationfield import RelationValue
import transaction
import datetime

plone = app.Plone
setSite(plone)



def remove_parent(obj, field):
  value = field.get(field.interface(obj))
  if not value:
    return
  if isinstance(value, RelationValue):
    value.__parent__ = None
  else:
    for relation in value:
       relation.__parent__ = None


def remove_refs(obj):
  for schemata in iterSchemata(obj):
    for name, field in getFieldsInOrder(schemata):
        if IRelation.providedBy(field) or IRelationChoice.providedBy(field):
          remove_parent(obj, field)
          field.set(field.interface(obj), None)
        if IRelationList.providedBy(field):
          remove_parent(obj, field)
          field.set(field.interface(obj), [])


counter = 0
for tx in plone._p_jar.db()._storage.iterator():
  for record in tx:
    counter += 1
    if counter % 100000 == 0:
        print datetime.datetime.now(), counter
        transaction.commit()
    obj = plone._p_jar[record.oid]
    if IUUID(obj, None) and IDexterityContent.providedBy(obj):
        if not plone.portal_catalog.unrestrictedSearchResults(UID=obj.UID()):
            remove_refs(obj)
            print '%r' % record.oid, obj

transaction.commit()

To cleanup the IntIds I borrowed the code from collective.relationhelpers/api.py at 4241db5596dfa2ec5948ea2a2f43396f04a0c53d · collective/collective.relationhelpers · GitHub :blush:

I'm now running some analytics, to see whats left and what caused that problem.
But it totally got rid almost of all binary data in the ZODB and shrunk my ZODB size by quite a bit.

Update:
I also had a attribute called event_information on some content, which basically stored the the content of a zope lifecycle event ?? Almost 100% positive this was custom code.

UPDATE 2

I can verify the issue now with a script and a naked Plone installation.

Environment:
Python 3.9.9
Plone.5.2.7

My buildout.cfg:

[buildout]
extends =
    http://dist.plone.org/release/5-latest/versions.cfg

parts =
    instance
    zopepy

[instance]
recipe = plone.recipe.zope2instance
user = admin:admin
http-address = 8081
eggs =
    Plone
    plone.app.mosaic


[zopepy]
recipe = zc.recipe.egg
eggs =
    ${instance:eggs}
interpreter = zopepy
scripts = zopepy

[versions]
zc.buildout = 2.13.6
setuptools = 51.3.3

Install:

$ python3 -m venv .
$ ./bin/pip install zc.buildout==2.13.6 setuptools==51.3.3
$ ./bin/buildout

Make sure there is a empty Data.fs (delete it if there is one)
Download and run the script from prove_relation_value_gc_issue.py · GitHub

$ ./bin/instance run prove_relation_value_gc_issue.py -s demo

It raises an error since there are relations left in the DB, thus also an unwanted Document. Since they reference onto each other the will never removed from the DB.

Instead of running the script you can verify the issue manually as well.
run

./bin/zopepy /path/to/ZODB-5.6.0-py2.7.egg/ZODB/scripts/analyze.py var/filestorage/Data.fs

After removing the Second document with the relation and packing the DB, those objects were still there:

...
z3c.relationfield.relation.RelationValue             1       162   0.0%  162.00
...
plone.app.contenttypes.content.Document              3      4332   0.1% 1444.00
...

The issue is in z3c.relationfield/event.py at 0.9.0 · zopefoundation/z3c.relationfield · GitHub
It sets the __parent__ attribute on relations via zope lifecycle events.

@zopyx I'm pretty sure you had content with relations and images and somewhere in that structure are "leftover" objects, which cannot be garbage collected.

7 Likes