We're looking at bulk adding a large number (some thousands) of files to our Plone site. They will be primarily PDF files. One concern we have is about database bloat, and another concern is how to manage assigning useful metadata to all of those files (which at present are stored on a simple filesystem).
To attack the first question, I verified that our instances (dev and prod) of Plone have blob storage configured, which they do. I then uploaded 256 PDFs totalling approximately 135 MB through the built-in "Upload" functionality into a folder.
The "var/blobstorage" folder increased from 15 MB to 151 MB, as expected -- pretty much exactly the upload size. So far so good.
However, the var/filestorage folder also increased from 83 MB to 170 MB. That's not the fill size of the files (it's only 87 MB), but it's a non-trivial amount. Given the PDFs are full-text searchable, is that likely to be just the full-text-search data, or is there something else going on? (If it's the full-text data that's not so bad -- we really want that functionality, plus many of our documents will be CAD drawings, so they won't be as text-heavy as these were.)
I just want to make sure I'm not missing something about this process.
Also, should I even be worried about the database across this sort of operation, or does Plone just take this all in stride? I don't know how big a deal dropping a few thousand PDFs on a Plone site really is.
Okay, for the metadata part, our ideal scenario would be that we can provide a CSV template that the managers of the departments in question can open in Excel, which they can populate with the relevant data, and which we can then map onto the uploaded files. I see that something like that was actually being worked on through GSoC extensions to collective.importexport, but I don't see anything about it being complete. Is there some reasonable way of doing this? I don't think it's something we're going to do every day, so the solution doesn't have to be hyper-elegant, and it can be something IT needs to do (so long as the people providing the data can use Excel or something similarly familiar for that step) but I'm not really at "develop my own Plone add-on" skill level at this time either.
One last database question:
This scenario is an extension of the original concept for the site, and we didn't allocate enough disk for quite this much file storage at the time. If I shut down the site, add a new virtual disk, move the var folder over there, simlink it back to its original location, and start up the site again, should it be more or less okay? Or is it better to modify the buildout.cfg to point at the new paths, re-run the buildout, then move the folders over? Or something else?
Many thanks for any help that you can provide. Thanks!