Plone with WSGI

Thanks to @tomgross (and Tres, of course), there's now a documented way to run Plone with WSGI with your favourite WSGI compatible web server:

http://blog.toms-projekte.de/run-plone-with-wsgi.html

But for me, it's hard to see real benefits from it. Prove me wrong. Why should we run Plone with WSGI instead of Medusa?

Pros

  • WSGI is kind of standard (PEP 333)
  • Let IT department be responsible for the servers
  • Choose your favorite WSGI server
  • Easier port management (no more "instances")

Cons

  • No clock server or other async or non-blocking code.

another big Pro:

  • WSGI is kind of standard (PEP 333)

But is it good enough? What does it add featurewise?

For example, I'd be interested in implementing various plone.transformchain adapters as WSGI middlewares, but unless those could be made asynchronous, they would not add much alone.

With Medusa, I could refactor most of those (all without too much ZODB access) to be executed after ZPublisher so that they would free zope workers to handle following requests (and save memory by serving more request with a single instance). How to do that with WSGI?

How do things like pyramid do it? I'm pretty sure they can pass off an iterator that is non-blocking.

Other filters in the pipeline would just need to know that they need to pass the request on and not do anything with it.

They have an adapted version of Pyramid compatible with async WSGI. Yet, we cannot make Plone fully async, because we need to control the amount of open ZODB connections (to optimize caching memory usage).

Probably we could use semaphores to on async WSGI to allow only 1-2 request at time to call ZPublisher, but then execute all transform code async.

Well, maybe with fully externalized catalog, it would be OK with smaller ZODB cache and it would be OK to have like 10 simultaneous async request, each with their own ZODB connection.

WSGI just replaces ZServer, right?
So could we say 'use 2 workers' with '2 threads' or something like that?

@jaroel Yes. But you have to be careful to not allow more simultaneous requests than you like to have active ZODB connections in connection pool (with caching).

I was wrong about Zope WSGI publisher not having support for blob stream iterators. It has (so it returns filehandle like iterable through WSGI pipeline, which could be streamed with its own without db connection). Now have to check how they behave and will returning an iterator allow WSGI server to process another request.

Sure, that's the same as we have now :smile:
It's the option 'zserver-threads' in plone.recipe.zope2instance :wink:

And cap the number of connections with haproxy.

What this make possible is that the IT department is responsible for the actual servers and we just have to provide an .wsgi file!

How does haproxy handle blobs? I've believed that non-blocking blob streaming is a important.

The answer is not well. Haproxy will block until the connection is closed which underutilized the zope streaming support. You can set haproxy to have higher connections than zope threads but its not an ideal solution because if you have some CPU intensive requests then it can mean other requests might be queued in zope when they could have been handled elsewhere.
I did discuss this issue with the author of haproxy. I suggested a feature where you could set it to send new requests on first byte rather than connection close. He thought it was not common enough to support however.
For now I use c.xsendfile which works well except for scaled.images.
I'd be interested if anyone has another solution to this problem. Perhaps another load balncer? I know zope corp was working on a new one.

I'm still not really sure, how blob streaming works with WSGI, but benchmarking looks like it doesn't really matter. Single thread ZServer (with async medusa for blobs) and single thread GUnicorn have similar berformance for downloading a blob:

ZServer with 1 worker thread:

Server Software:        Zope/(2.13.22,
Server Hostname:        localhost
Server Port:            8080

Document Path:          /Plone4/testimage
Document Length:        2613064 bytes

Concurrency Level:      20
Time taken for tests:   2.243 seconds
Complete requests:      200
Failed requests:        0
Total transferred:      522674200 bytes
HTML transferred:       522612800 bytes
Requests per second:    89.18 [#/sec] (mean)
Time per request:       224.263 [ms] (mean)
Time per request:       11.213 [ms] (mean, across all concurrent requests)
Transfer rate:          227600.44 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:   118  222  24.5    223     290
Waiting:       10   33  22.1     27     128
Total:        118  222  24.5    223     290

Percentage of the requests served within a certain time (ms)
  50%    223
  66%    228
  75%    232
  80%    234
  90%    245
  95%    253
  98%    288
  99%    290
 100%    290 (longest request)

GUnicorn with 1 worker thread:

Server Software:        gunicorn/19.3.0
Server Hostname:        localhost
Server Port:            8080

Document Path:          /Plone4/testimage
Document Length:        2613064 bytes

Concurrency Level:      20
Time taken for tests:   2.281 seconds
Complete requests:      200
Failed requests:        0
Total transferred:      522679400 bytes
HTML transferred:       522612800 bytes
Requests per second:    87.66 [#/sec] (mean)
Time per request:       228.144 [ms] (mean)
Time per request:       11.407 [ms] (mean, across all concurrent requests)
Transfer rate:          223731.11 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.8      0       7
Processing:    15  217  40.2    223     255
Waiting:        7  213  40.2    219     251
Total:         16  218  40.1    223     255

Percentage of the requests served within a certain time (ms)
  50%    223
  66%    233
  75%    236
  80%    238
  90%    244
  95%    249
  98%    252
  99%    255
 100%    255 (longest request)

Here is perhaps one reason WSGI might be a good idea.


I'm not 100% sure I understand it but I believe it lets you dynamically adjust which works get which kind of requests by giving periodic feedback to the load balancer. You could for instance start telling the lb about frequent large blob requests making sure one instance gets them most often.
It's low on documentation though so it doesn't say how it handles when to send the next request.