How to prevent zope/zeo starting if filestorage not found?

We had an issue recently where our zeo server restart. The blobs volume mounted ok but the volume with the filestorage volume didn't. This created new filestorage that were empty. Zope then proceeded to clean up the blobs that weren't referenced in database so they all got deleted.

I'm thinking there must be some kind of setting that prevents this from happening.

The create=false seems like it should be what I want but according to the code this always creates is rather than prevents it being created - https://github.com/zopefoundation/ZODB/blob/master/src/ZODB/FileStorage/FileStorage.py#L276.

Looking at the code, maybe the best I can do have readonly versions of the database files on my main disk before I mount so if the mount fails zope will raise an error?

1 Like

Would a shell script suffice?
ie something like [[ -d /mnt/storage/my_filestorage ]] && bin/zeoctl start

I had a similar issue with blobstorage folder recently. In one second i lost > 300gb of files (god bless backups).

The problem was similar because it seemed that at the moment when the script started, it cannot see blobstorage folder (on a mount point) and the zeo recreated an empty one over it erasing everything.

This was a network problem and i still don't know how was it possible to override a folder in that way with a mkdir command.

I patched the code to raise an exception and stop instance startup instead of creating new folders.

There's clearly a race condition between L265:

and L274

If there's a network problem, the open will stall and after a timeout it returns the error, and set create = 1. Meantime the network is back but create is 1. And then L286 happen:

I think a conservative fix is to remove create = 1 in L274 (and maybe L283) and also L286 is wrong because it tries to serve 2 different case (create passed as parameter to the init function and create set IN the function).

L286 should run ONLY if the parameter has explicitly passed. So use a different variable to set to 1 (ex: create_missing = 1 IN the function) in L276 and L283.

Then you can safely use os.remove(file_name) protected by create = 1 condition (and not create_missing) because you're really asking for a recreation and for create_missing you just create a file. If the network came back in the middle, create would fail and let you know the file exist so you can continue.

In general, I would remove deletion and move it explicit on an external util, so you're really sure the user want to remove it if it already exist. A further better one would just not permit deletion and let the sysadmin do it outside Zope.

1 Like