I'm looking for guidance on managing large file uploads. My use case is needing to allow multiple users to upload files as big as 1GB into Plone. The idea is to present Plone as a "dropbox" solution.
Repeating my standard answer from 10 years ago: Plone is not a data grave but a CMS. One way of misusing Plone is what you are actually trying to do.
You may look into xmldirector.plone which allows you mount various storages into Plone (Dropbox support is in the making). View my talk from last Plone conference.
Likely, you are using Plone with "ZServer" as network interface (rather than "WSGI"). In this case, you will need to adapt a "ZServer" configuration option (which controls the maximum size of a request body). Potentially, your HTTP server needs special configuration, too.
Once you have properly configured "ZServer", it will copy larger uploaded files to temporary files (you will need to provide appropriate space in your operating system area used for temporary files). Only after the upload it complete (and the whole file is locally present in the temporary file), "ZServer" will pass on the request to Plone. This implies that you have no means whatsoever to influence the uploading itself (e.g. define file size limits based on users or target destinations).
Processing huge files in Plone may take a long time (e.g. for
text extraction and indexing) - and during this time block a "worker". This may seriously hamper response time of the entire site. It is therefore important to make such tasks asynchronous and perform them in a separate thread. Products like "QueueCatalog" may help with this.
Overall I agree with Andreas, but...
... it always depends on and is possible.
First you have to configure NGINX to buffer the uploads before they are handed over to Zope. This is independent of Zope/Plone and well documented.
Second - for sure - you have to use a "modern" Zope and Plone with blob storage. Ensure blobs on clients are not send over the zeo connection, but use some mount (NFS, some storage system, ...): use shared blobs configuration.
Third, if you plan to scale and want to reduce internal bandwidth you might want to have a look at collective.xsendfile.
Fourth, dependent on you case, wildcard.media may become handy.
and fifth - have your monitoring setup (munin or similar) to check system health, bandwidth, ... to able to detect any problem space early.
I challenge the point "Plone as Dropbox" solution.
If you want/need something Dropbox-ish then use something Dropbox-ish but not Plone.
Integration of "drive" solutions with the underlaying OS is much better solved and approachable and userfriendly than any Plone related solution. ok, you can nowadays drag & drop files for upload into a browser but....who is actually doing this? I guess nobody...
This is slightly off topic, but I mention it since it might cover some similar usecases (for example, if this is mainly a way to send big files from the website)
wetransfer.com is a great way to send large files (I volunteered to make the norwegian translation for the fist version) and has an API ( https://github.com/search?l=Python&q=wetransfer&type=Repositories&utf8=✓ ) ,
PS: I assume there is a reason you dont just want to use the dropbox API (and maybe index them in Plone)
IF this is the main reason, you might consider a 'Media Asset Manager' (We used a product from canto.com in my former job many years ago )
a) plone already suports uploading large files. @vangheem implemented it. It currently has a slight bug but all the code is there and it works using wildcard.foldercontents I believe. (https://github.com/plone/plone.app.content/issues/64)
b) While I agree making plone into app like dropbox is probably not a great idea, a modern CMS should allow uploading large files over unreliable networks. A perfect example is videos to be transcoded such as with plumi or wildcard.media.
But anyway. Plone supports large file uploads with none of the complex solutions mentioned above. Yah for @vangheem!
I may have overstated the term Dropbox, but I get what you're saying. We could easily provide a Dropbox and then just require them to share a URL to Plone.
As usual... all this feedback is very valuable, so thanks all!
Also not very related.
Plonetruegallery uses an approch with showing flick and / or picase images.
This approch might work for other use-cases.
(I did start a collective.ptg.dropbox project, but it was never finished... ( there is some code here, but probably completely useless: https://github.com/collective/collective.ptg.dropbox )
In case someone do not know: It is possible to make a folder in Dropbox into an URL, like this: https://www.dropbox.com/sh/mv0getza0zox9ls/AADvnpHyZXZ9kknDj9eC4Alna?dl=0
(which also means you can host a static website with dropbox
I forgot to mention the TUS upload system build into plone requires the use of sticky sessions or a shared temp folder. It breaks large files down into 2mb blocks in the browsers and sends them one at a time and the server stores them in a temp dir on an instance box. When the final is received it will join them together and move that into the blob dir.
Now that I think about this @vangheem, using the temp dir is perhaps not a good idea. If it stored the parts directly in the blob dir and the blob dir was shared (as is common), you wouldn't need to have sticky sessions. Also if temp and blobs are on different filesystems, the move would be slower than if they are on the same filesystem right?
If you want manage the large uploads in your plone application than you could use the Products.Reflecto product . It is a tool to incorporate part of the file system into a Plone site. It allows you to browse through a filesystem hierarchy and access the files in it. Files are represented as simple downloadable object, not as full CMF or Plone content types.
A bit off topic (again), but this looks quite nice (looks like it includes subfolders):