Zeopack fails raising KeyError

hvelarde · March 4, 2017, 10:52pm

We have a site running distributed in two different servers; server A is running ZEO as ZRS master and server B is running ZEO as ZRS slave.

seems that a couple of weeks ago the zeopack command is not finishing; I just tried today in both server and command fails with KeyError like this:

$ bin/zeopack 0
Traceback (most recent call last):
  File "bin/zeopack", line 42, in <module>
    sys.exit(plone.recipe.zeoserver.pack.main(host, port, unix, days, username, password, realm, blob_dir, storage))
  File "/opt/plone/example.com/eggs/plone.recipe.zeoserver-1.2.9-py2.7.egg/plone/recipe/zeoserver/pack.py", line 58, in main
    _main(*args, **kw)
  File "/opt/plone/example.com/eggs/plone.recipe.zeoserver-1.2.9-py2.7.egg/plone/recipe/zeoserver/pack.py", line 39, in _main
    cs.pack(wait=True, days=int(days))
  File "/opt/plone/example.com/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZEO/ClientStorage.py", line 916, in pack
    return self._server.pack(t, wait)
  File "/opt/plone/example.com/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZEO/ServerStub.py", line 155, in pack
    self.rpc.call('pack', t, wait)
  File "/opt/plone/example.com/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZEO/zrpc/connection.py", line 768, in call
    raise inst # error raised by server
KeyError: '\x1dx'

any hints?

davisagli · March 4, 2017, 11:33pm

This looks like a traceback where the ZEO client is re-raising an exception that originally occurred on the server. To find the original traceback you'll need to look in the ZEO server log.

hvelarde · March 4, 2017, 11:58pm

thanks, David, this is what I found on the master:

2017-03-04T17:02:07 (::ffff:127.0.0.1:51715) pack(time=1488571327.685818) started...
2017-03-04T17:17:14 (1190) Error raised in delayed method
None
2017-03-04T17:17:14 (::ffff:127.0.0.1:51715) disconnected
2017-03-04T18:00:58 Unexpected error
Traceback (most recent call last):
  File "/home/cartacapital/cartacapital.portal.buildout/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZODB/ConflictReso
lution.py", line 234, in tryToResolveConflict
    inst = klass.__new__(klass, *newargs)
TypeError: object.__new__(X): X is not a type object (BadClass)

on the slave:

2017-03-04T17:01:06 (::ffff:127.0.0.1:44257) pack(time=1488571266.607441) started...
2017-03-04T17:16:00 (33903) Error raised in delayed method
None

the TypeError doesn't seem to be related with the pack.

hvelarde · March 7, 2017, 2:26am

is the fsrecover.py script an option here?

davisagli · March 7, 2017, 3:12am

I don't know.

This looks like it might be related to trying to resolve a ConflictError if the class that conflicted is not present in the ZEO server's Python environment. Unfortunately the traceback doesn't indicate which class is involved (BadClass is some substitute created by the conflict resolving code). So you may need to add some debug logging in ConflictResolution.py. Or it may be enough to make sure that your project-specific package is included in the zeoserver's eggs.

I don't know much about the ZODB packing algorithm so don't know why that would trigger a ConflictError. If you ask on the ZODB list Jim may be able to point you in the right direction.

hvelarde · March 7, 2017, 5:17pm

thank you, very much, David; I opened a thread on the ZODB list and, according to Jim, seems we have some missing objects from the database.

he suggested to add pack-gc false to the <filestorage> section of the ZEO server configuration to disable garbage collection.

I did so, and I was able to pack the ZODB to 7GB from 9GB.

I was wondering if it's possible to add that to the buildout configuration so I don't lose it on the next update.

Jim also suggested me the following packages to try the garbage collection and see if we can get more information on the missing object:

but they lack decent end-user documentation; does anybody here has used them?

UPDATE: @alert just told me to use pack-gc = false on my ZEO server part according to plone.recipe.zeoserver.

cleberjsantos · March 9, 2017, 10:43am

See here a good ref about zc.zodbdgc http://www.zodb.org/en/latest/articles/multi-zodb-gc.html

hvelarde · March 9, 2017, 1:26pm

thank you, very much, that document is awesome!

silviomarino · May 13, 2024, 10:58pm

I also get a key error on zeopack.

Traceback (innermost last):
Module ZPublisher.WSGIPublisher, line 162, in transaction_pubevents
Module ZPublisher.WSGIPublisher, line 371, in publish_module
Module ZPublisher.WSGIPublisher, line 274, in publish
Module ZPublisher.mapply, line 85, in mapply
Module ZPublisher.WSGIPublisher, line 63, in call_object
Module , line 3, in manage_pack
Module AccessControl.requestmethod, line 88, in _curried
Module App.ApplicationManager, line 386, in manage_pack
Module ZODB.DB, line 838, in pack
Module ZEO.ClientStorage, line 562, in pack
Module ZEO.asyncio.client, line 764, in call
Module ZEO.asyncio.client, line 743, in call
Module ZEO.asyncio.client, line 756, in wait_for_result
Module concurrent.futures._base, line 432, in result
Module concurrent.futures._base, line 384, in __get_result
KeyError: b'\x00\x00\x00\x00\x01?\xac\xd3'

I edited buildout.cfg and added pack-gc = false and pack-keep-old = false. I ran the buildout again and zeopack finished without errors. Now I'm not sure what to do. I believe I should restore the original buildout.cfg, but zeopack is faster and fixed the key error. What are the disadvantages of keeping pack-gc = false? Will I have problems in the future if I keep it like this?

davisagli · May 14, 2024, 8:40pm

@silviomarino pack-gc = false means that the garbage collection phase of packing will be skipped, so the database won't be packed as much as should be possible. It will still remove old copies of objects that were modified, but it won't find and remove objects that are no longer reachable from the root of the database.

silviomarino · May 16, 2024, 1:34am

David, thanks for your explanation. In fact, when I removed pack-gc=false from buildout.cfg, zeopack failed again.
I'm running tests on a copy of the production environment. We use a Zeo installation of Plone 5.2.4.
I found some pages that tell zodbverify to get the parent of the broken object and so I can replace broken objects with dummies to get rid of the broken object. But I can't run

$ bin/zodbverify -f var/filestorage/Data.fs

because zodbverify is not in the bin and I didn't find anything in GitHub - plone/zodbverify: Verify a ZODB by loading all records. indicating how to configure buildout.cfg to have zodbverify in the bin folder.
I appreciate any suggestions

davisagli · May 16, 2024, 4:57am

@silviomarino You should be able to add zodbverify to the eggs section in buildout.cfg like this:

eggs =
    # ... other eggs...
    zodbverify

Then run buildout