Pros and cons keeping blobs in the RelStorage

dest81 · January 12, 2020, 9:49pm

We have few plone instances on the separate VMs. We use MySQL relstorage and NFS folder to share blobs between instances.
From version 3 relstorage documentation recommends to keep blobs in the database (they even changed default value shared-blob-dir to false).
Could someone explain why? I believe it will slowdown database in case of many small and big blob files.
Did anyone measure performance?
I see one reason - it could make backups strategy easier.

I found this very nice article Pros and cons keeping blobs in the RelStorage. And yes shared blobstorage doesn't allow parallel commit features. In the restorage 3 they improved blob cache maintenance. For now we use relstorage 1.5.1 and we don't have many editors so editing conflicts shouldn't be an issue.

So final question is: does it make sense migrate blobstorage from folder to DB if relstorage version is older than 3 and editing is not an issue?

agitator · January 13, 2020, 6:46am

Maybe https://github.com/zodb/relstorage/issues/392 gives you some answers, it did for me

agitator · January 13, 2020, 6:58am

And about performance https://dev.nextthought.com/blog/2019/11/relstorage-30.html

dest81 · January 13, 2020, 8:48am

Thanks @agitator, so if you are using version 3 or higher it is recommended move blobs file to db.
What if we use version 1.5.x or 2.x and don't care about parallel commits?
We would like to move blobs to db and we will not update relstorage to version 3. So I expect that database could be slower especially for us as we have size of blobs 3 times bigger than db. Any thoughts?

jensens · January 13, 2020, 10:04am

So far I only have experience with the Oracle and blobs in DB, which works perfectly fine on 1.6.3.
I know from a customer, MySQL is not suited for large DBs and including in-DB-blobs.
But no experience with Postgres here.

Best create an issue labeled "Question" at https://github.com/zodb/relstorage/issues to get a detailed answer.

mauritsvanrees · January 13, 2020, 3:50pm

From what I remember from the last time I used RelStorage (several years ago, I think version 1.5/1.6), Postgres had support for large objects, and RelStorage is using that. I guess these large objects would be similar to the Plone blob storage, so handled outside of the 'real' postgres database. Or somehow stored efficiently in postgres.
The blob_chunk table from RelStorage uses it. A quick GitHub search only showed me some related code in a cleanup method.

Summary: saving blobs in Postgres is probably efficient enough. But your mileage may vary, and if your Plone instances need to fetch those blobs often anyway, then maybe a shared network storage may be fine too.

dest81 · January 17, 2020, 9:40am

Yes you are right postgres uses pg_largeobject table for storing files.
It should be efficient enough but still I don't think that database should be busy with serving big files. Relstorage has settings to configure cache and help database with big files but I am not sure how/if relstorage cache efficient in old versions.
I have decided to keep blobs in the shared folder for now (until we migrate on relstorage 3).

Thank you everyone for the comments