awesome outcome!
I'm also happy to read "WebDAV" in there. This is still a hidden gem in Plone.
Man, that is so AMAZING! Thank you so much for working on this Asko. This opens up so many possibilities for Plone and Volto. I can't wait to see this at the Beethoven Sprint in Bonn!
WebDAV is still an important protocol. Unfortunately the Zope implementation is as buggy as back in the early 2000s
Agreed on that. I just ensured that the old code runs, not touched it. But if there is a recommended Python WebDAV library to use, I could try if that could be integrated.
No recommendation. Only for WebDAV clients but this is not relevant here.
Thank you for the reminder. I might actually take a stab at fixing that. This has also annoyed me for years.
And thank you @datakurre for making this a viable thing to still do at this point.
@tschorr You asked, If I could point where the difference between ZServer and Waitress is. I don't see as much as similarity as you between those, but I'll try something...
Here ZServer creates new channel for every new client connection on the main thread. That channels is "asynchat"-type asyncore dispatcher object that consumes "producers". So in ZServer streaming is delegated from the worker thread to the main thread when the worker thread pushes its stream iterator producer to the main thread managed channel (crossing the thread boundary).
For me it seems that waitress does not use similar producer-consuming-asynchat, but consumes streams directly in waitress (worker) thread and pushes data directly to channel buffer:
I would guess that this gives waitress better performance, because concurrent connections add less overhead for the main thread. But for Plone this is a problem, because we can only have as many threads as we can have ZODB connections.
Not sure about this. write_soon() invokes the trigger module via the _flush_outbufs_below_high_watermark() method, and at the top of that module I find this comment:
I might be mistaken, but it seems to summarize the same mechanism you describe here (also it reveals some history of waitress).
To clarify my previous comment, it was related to the large file problem.
The request dispatching in waitress happens here:
again this looks pretty familiar from ZServer/medusa, except that the code is much cleaner and much more readable.
The problem is present with all blobs regardless of their size. Large files are just the top of the ice berg that can DOS the site.
In my opinion you found the difference between ZServer and Waitress. ZServer channels are based on asynchat and they consume blob file handles on the main thread asyncore loop. Waitress consumes file handle in thread, and that keeps the thread reserved until blob
as been served.
Waitress implementation has probably better overall performance, but has the issue of reserving threads, and that is bad for Plone.
@datakurre @tschorr My 5cent.
I agree with Asko. I'm not enough expert with ZServer/waitress internals but I did a try from an external point of view. I hope that i don't make wrong assumptions on my test, if i did I'm sorry.
I'did a small test (using https://www.python.org/dev/peps/pep-0333/#optional-platform-specific-file-handling, I don't know if blob in wsgipublish-ing already use it, but IMHO would be better).
import os
def myapp(environ, start_response):
f = open('bigfile.dat', 'rb')
start_response('200 ok', [('Content-Length', str(os.path.getsize('bigfile.dat')))])
return environ['wsgi.file_wrapper'](f)
bigfile.dat is 1GB file of random data. I try this with waitress (1 thread) and bjoern (1 thread) using
ab -n 64 -c 64 http://127.0.0.1:8080/
during the test I try in another shell to do another request:
wget http://127.0.0.1:8080/
With waitress the wget waits for a connection until apache-benchmark is not finished, with bjoern the request starts immediately.
I think that, if blob stream implements wsgi.file_wrapper and the wsgi server implements it good (https://github.com/Pylons/waitress/blob/master/docs/filewrapper.rst reading last rows probably there are space for improvements), the gap between wsgi and zserver in this use case could be filled.
@mamico I think you are touching here on another topic that is not related to this thread (not for me until now, at least).
It's pretty obvious from the waitress docs that the support is limited, specifically there's no sendfile support. But I can't find any reference to sendfile in ZServer either, nor can I find a reference to wsgi.file_wrapper in the Zope code. But I might be missing something.
Personally I find it more and more difficult to keep track of all the issues and potential flaws mentionned here. It started with supposedly missing threadpools and asyncore (I think we all agree by now that waitress uses both), now we're discussing asynchat and sendfile. What's next:-)
@mamico simply confirmed what I have mentioned earlier in this thread that non-blocking blob downloads require currently to use something else than the default waitress.
I don't understand the sendfile point. ZServer is old enough to not know anything about it. It simply implements streaming by itself (with passing stream iterators / medusa producers for asynchat to consume).
In my Twisted ZServer I use Twisted "FileSender" to do the same.
@tschorr Yesterday we wondered, what is the difference between the old ZServer and Waitress so that ZServer does not block when serving blobs, but Waitress does. That was now found to be the difference in "Channel"-implementation.
Yesterday I discovered the Apache X-Sendfile module that a customer has apparently been using for years. It lets Apache send a file, so Plone does not need to handle it anymore. This could be interesting as an alternative to a thread in ZServer/waitress to serve a blob.
The module has not been updated since 2012. Maybe there are alternatives. This is for Apache, but I expect that you can do a similar thing in nginx.
Installation in Ubuntu 18.04: apt install libapache2-mod-xsendfile
Then within a VirtualHost
section in your Apache config add something like this:
XSendFile on
XSendFilePath /home/files
Usage in a browser view:
download = "some_file.ext"
response.setHeader(
'X-Sendfile', "/home/files/%s" % download
)
response.setHeader('Content-Type', 'application/octet-stream')
response.setHeader(
'Content-Disposition', 'attachment; filename="%s"' % download
)
You may want to have a look at https://github.com/collective/collective.xsendfile
wsgi.file_wrapper implemented in last Zope released
I experimented good performance improvements on serving blob (and static resources) files https://gist.github.com/mamico/c4e09e64793a73981da0f7bdb5bbec42