How big is a blob-file?

I am a bit confused about the size of Data.fs

I have a site with many images(2500) /files(137 PDFs, Words) /blobs (blob-fields) and some 'pages' (about 130 but long) (all content dexterity).

The database is just packed after removing most revisions with collective.revisionmanger and is 2750 Mb.

  • How big are blob files in Plone 5.1.x (database). If I have a 1 Mb image ? If I have a 30 Mb PDF ?
  • In the ZMI, I can see some sizes. Does this size mean that the image is 5.76 Mb and that the size in Plone is much smaller? Or could this be some old image that has not been 'migrated to blob' once ?

Binary data stored as blob inside blob-storage are stored 1:1 on the filesystem.

But how big are they in the Plone database?
Except from the catalog indexes, what is stored?
Are any 'previews' stored in the database (thumb of an image ?).

In my case: I find it a bit strange that the database is quite big with so few (other) items.

The persistent blob object is only a small Python with basically a reference holding the filename.

In doubt that the debugger console and retrieve e.g. an image field from an image object and pickle the image subject..should not be large perhaps hundreds of bytes or so...

The ZODB has "blob" support. It allows to store "Binary Large OBject"s as is directly on the file system (rather than as a pickle in the ZODB's primary storage). Thus, if you have an image of 10 MB, the corresponding blob will be 10 MB: the blob contains exactly the image data. In addition, the primary storage contains a small blob representative (= proxy) with administrative data (object id, serail) necessary for the linkage between the primary storage and the blob and transaction control.

Note that while the main content is in a blob, the primary storage can still have significant information about the object: metadata, indexing data, etc.

Thanks. This is what I thought, I just thought the database was quite big (?) so I was wondering if something else was saved in the database (not in the blob-file). (Like a small thumb).

PS: I did notice a long time ago that if I missed PIL dependencies, Plone still showed 'thumb-sized' image, while all the other scales were missing.


So I assume this means that basically, all blob-files (and blob-files) take up about the same size in Data-base / Data.fs.


Is 2750 Mb for (about) 2750 objects "normal" ? Sounds a bit hight to me (note that I purged all history and packed the database)


By the way: Sorry for the 'confusing subject'. Probably it should have been 'How big is a blob-file in Data.fs' or something similar

One remark: image scales of an image are not stored as blobs. They are stored as annotations.

So if you have a site that does not use most of the image scales (as mine do, I use preview, large and full (custom scale)), I should delete them (listing, icon, tile, mini ) ??

When you scale an image PIL/Pillow will actually create a new file, which is what is stored in annotations. plone.scale does some cleanup when the image is modified https://github.com/plone/plone.scale/blob/master/plone/scale/storage.py#L223. But those scales only happen on request (if not already in annotation) - if no one is trying to view a thumb scale of an image it's not going to be in your annotations (a thumb size image is probably a negligible amount of data anyway). I would think it unlikely that the scales are contributing a large footprint in your database.

If you stop using certain (custom) image scales in your site, the annotations will not always be cleaned up. You can check (and remove) all image scales using this script:

We've been using this in production on some sites for several years, but warning apply: first try it on a testing environment if you don't use dry run. :wink:

So when having a site with lots of images it would be better to use an external image service (e.g. https://cloudinary.com/) or develop a blob based storage for image scales (based on https://github.com/plone/plone.scale/blob/master/plone/scale/storage.py). Would it be possible to serve the image scales then directly from file system with e.g. NginX? What would one estimate to build such a blob based image scale storage?