Filesystem issues with very large blobstorages (inodes)

Does anyone have experience with running Plone in a datacenter with specialised filesystems?

One of our clients who we help with technical issues has a Plone site with a very large blobstorage (100000+ files). The system administration is running most of the systems on a SAN, which until recently was using ext4. Now they've switched SAN suppliers and the new filesystem is a 'high performance optimised' etc. filesystem, but has a smart way of splitting metadata and data in separate storage locations, for which space is reserved in advanced. And they made a miscalculation.

Only after the migration they found out Plone is storing blobs in deeply nested subdirectories (requiring many inodes and other metadata per blob), but backing up the blobstorage to the SAN regularly using collective.recipe.backup which uses rsync with hardlinks makes things worse. I.e. they were very quickly running out of inodes.

Blobstorage moved from lawn to bushy layout years ago because from what I understood filesystems at that time didn't like 50000 files in one directory. The above is I think an example where advancements in filesystem technology is causing the opposites. 80% of the blobstorages in the world will probably be hosted on a unix/linux variation and ext4. There is the layout strategy in ZODB/blobstorage and there was a migration tool provided to move between bushy and lawny, but these are two opposites. Did anyone have a need in the last years to have a layout strategy inbetween?

sounds like they might be using XFS…

your problem seems complicated to solve; the only workaround I see is creating backups using the zipbackup script in collective.recipe.backup, that way, the number of inodes will remain almost the same after the backup is run.

I would post this question to the very active ZODB mailing list.

One of the strategies will indeed be to move to zipbackups, but it will be very space inefficient compared to rsync/hardlinks.

I have the same issue on a prod server.
I think I will try the zipbackup approach you recommend.

Thanks

1 Like

I though your problem was inodes and not disk space :wink:

can you can elaborate more so we can all learn a little bit more?

@hvelarde

The problem with a large blobstorage is that it uses up a lot of inodes, minimal of 9 inodes (8 directories and a file) per blob. When you back up your blobstorage twice a day for 15 days, you'll use 9*30 = 270 inodes per blob. With a 100.000 objects blobstorage that's 27 milion inodes.

If your blobstorage is 10Gb, you'll only spend another 10Gb for the blobstorage backup, because normally c.r.backup will use hardlinks for all the backup revisions. If you switch to zipbackup for the blobstorage you'll use way less inodes, but will use 30*10Gb storage, there's no hardlink deduping anymore. So you trade inodes for storage.

yes, I understand that but, again, I though your problem was inodes and not disk space.

on the other side, maybe you have to review your backup strategy: doing full backups twice a day is a bad idea; normally you do incremental backups that consume lot less space because you only backup content that has been added/changed since last backup and that should never replicate the whole database and you should not have an issue on inodes neither disk space.

the only time when you should generate a new full backup is after a database compaction, AFAIK.

But again, where do I state that I have a problem with disk space? I only say it's space inefficient.

c.r.backup doesn't yet have the extra logic/tooling to do differential blobstorage zip-backups and -restores, but this could be added.

1 Like

thanks, I didn't knew that; I just reviewed the documentation and now is clear to me how it works and what are the limitations.

according to this article, it is possible to implement incremental backups using rsync:

maybe @cleberjsantos wants to help implementing this feature :wink:

1 Like

@hvelarde Cool, I sure wants :smiley:

We use https://bitbucket.org/nikratio/s3ql/ to backup plone to s3 or swift. The nice thing it does is it not only stores hard links but also does deduplication so copies of the same data aren't stored twice. Might seem a little crazy but it supports a local fs so you could use this to get around your inodes issue without taking up a lot more disk. It doesn't require any retooling as its a virtual fs. We use this via docker so we don't even need to install system packages. Of course you are adding extra layers so more points of failure so use at your own risk.