[Solved] How can i get the Object Id or the Path of Blobfile?

1letter · July 30, 2020, 2:57pm

The blobfile lives here:

/Plone5/py2/var/blobstorage/0x00/0x00/0x00/0x00/0x00/0xa7/0xe1/0x82/0x03d9906455146bcc.blob

How can i find this Blobfile in my Portal Catalog or in my Site Tree? Can i convert the "0x..." in an oid or what is the right way to resolve physical Path to the Object?

Update:
Now, i have a browserview to check my candidates:

class Clean(BrowserView):

    def __call__(self):
        from ZODB.utils import p64
        from ZODB.blob import FilesystemHelper
        portal = api.portal.get()
        app = portal.aq_parent
        fshelper = FilesystemHelper("/Plone5/py2/var/blobstorage")
        mylist = [
            '/Plone5/py2/var/blobstorage/0x00/0x00/0x00/0x00/0x00/0xa7/0xce/0x04',
            '/Plone5/py2/var/blobstorage/0x00/0x00/0x00/0x00/0x00/0xa7/0xe9/0xc4',
            '/Plone5/py2/var/blobstorage/0x00/0x00/0x00/0x00/0x00/0xa7/0xc4/0xb4'
        ]
        
        for oid, oidpath in fshelper.listOIDs():
            if oidpath in mylist:
                obj = app._p_jar.get(oid)
                # obj : <ZODB.blob.Blob object at 0x7fa4afd73350 oid 0xa7c4b4 in <Connection at 7fa4b4e0fc10>>
                # how to continue ? 
        return 'ok'

dieter · July 30, 2020, 4:37pm

You cannot easily.

The hex values specify the oid and the serial (first and second 8 bytes) of the object. With the oid, you can load the object from the ZODB (as you did). But, this gives you the raw object -- without its context, e.g. its location.

For a client, I have implemented a small script which visits all File objects and outputs a map "location -> blob-oid". A similar script would allow you to locate your blobs based on their oid.

jaroel · July 30, 2020, 5:33pm

Something like this:

for brain in context.portal_catalog(portal_type='File'):
    obj = brain.getObject()
    path_on_disk = obj.file._blob._p_blob_committed
    url = obj.absolute_url()

I'm unware of something that will do the reverse as the zodb object doesn't have __parent__ pointers:

>>> obj = app._p_jar.get(b'\x00\x00\x00\x00\x00\x00\x1a6')
>>> obj
<ZODB.blob.Blob object at 0x11575c5d0 oid 0x1a36 in <Connection at 114bb1a10>>
>>> obj.__parent__
Traceback (most recent call last):
  File "<console>", line 1, in <module>
AttributeError: 'Blob' object has no attribute '__parent__'

dieter · July 30, 2020, 7:21pm

Yes. I did not yet know _p_blob_committed, but if it gives your the path, it is perfect.

1letter · July 31, 2020, 6:49am

strange behavior is there lazy load like mechanism ?

brains = catalog(portal_type=['File'])
obj = brains[0].getObject()        
blobfile = obj.file
blob = blobfile._blob
import pdb
pdb.set_trace()

# first attempt
(Pdb++) pp blob._p_blob_committed 
None

(Pdb++) dir(blob)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__getstate__', '__hash__', '__implemented__', '__init__', '__module__', '__new__', '__providedBy__', '__provides__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_create_uncommitted_file', '_p_activate', '_p_blob_committed', '_p_blob_ref', '_p_blob_uncommitted', '_p_changed', '_p_deactivate', '_p_delattr', '_p_estimated_size', '_p_getattr', '_p_invalidate', '_p_jar', '_p_mtime', '_p_oid', '_p_serial', '_p_setattr', '_p_state', '_p_status', '_p_sticky', '_uncommitted', 'closed', 'committed', 'consumeFile', 'open', 'opened', 'readers', 'writers']

# second attempt
(Pdb++) pp blob._p_blob_committed 
u'/Plone5/py2/var/blobstorage/0x00/0x00/0x00/0x00/0x00/0xaa/0x68/0xbc/0x03d9906455146bcc.blob'

My Solution open/close the file:

brains = catalog(portal_type=['File'])
obj = brains[0].getObject()        
blobfile = obj.file
blob = blobfile._blob
f = blob.open('r')
f.close()
print "{} {}".format(blob._p_blob_committed, obj.absolute_url())
"""
/Plone5/py2/var/blobstorage/0x00/0x00/0x00/0x00/0x00/0xaa/0x68/0xbc/0x03d9906455146bcc.blob
http://127.0.0.1:20580/.../xxx.jpg
"""

Now i have both, real filesystempath and object url
thanks guys, for help!

Background: if i rebuild the catalog, very old word files breaks the wvText with segmentation fault. i have a script wich collect all broken word files. i will investigate what happen with this files. but i need the object path

#!/bin/bash
# the bash script to find the broken word files
find /Plone5/py2/var/blobstorage/ -type f -exec sh -c "file -i {} | grep 'word' | sed 's/:.*//'" \; | while read I ; do
    wvText $I /tmp/wv.txt
    RESULT=$?
    if [[ $RESULT -ne 0 ]]
    then
      echo "$I" >> wvfiles.txt
    fi
done

jaroel · July 31, 2020, 7:56am

The _p_blob_committed doesn't un-ghost of the _p_?
You'll have to https://zodb.readthedocs.io/en/latest/api.html#transaction.IPersistent._p_activate the object first: blob._p_activate() .

1letter · July 31, 2020, 8:23am

Yes, that do the job.

brains = catalog(portal_type=['File'])
obj = brains[0].getObject()        
blobfile = obj.file
blob = blobfile._blob
blob._p_activate()   
print "{} {}".format(blob._p_blob_committed, obj.absolute_url())
blob._p_deactivate()

Thanks.