[Zope4/WSGI] ZServer threads vs. WSGI server

zopyx · January 2, 2019, 6:28am

The ZServer in Zope traditionally used to work with a configurable number of threads. So a ZServer instance could never process process more than number of threads in parallel.

What does the change to a WSGI based server mean? Can we have an arbitrary number of requests in parallel?
How about the per-thread ZODB object caches?

Any opinion on using/supporting asyncio?

djay · January 2, 2019, 7:10am

pretty sure Jim Fulton covered this in his keynote at teh conference. But also well covered in other posts.

The answer is no asyncio, no unlimited threads. One thread per cache remains because there is no way to lock objects. Jim did mention something about potentially trying something in the future whereby you use some kind of memory protection to allow multiple threads to access the same cache up until one modifies something and then you block. Something like that. That wasn't in his talk I think.

I believe the best way to solve issues around threads being blocked (ie not CPU bound) currently is collective.futures or using p.a.async or similar.

jordic · January 13, 2019, 11:41am

just digging inside the waitress repo.. found this https://github.com/etianen/aiohttp-wsgi related to a deprecation on asyncore (thought zserver is based on this)
not sure if makes sense but it's a good path to mix async with sync handlers.

djay · January 14, 2019, 5:50am

it doesn't really matter if zope is using asyncio or asyncore to handle incoming connections. It doesn't change how you build a zope application because the ZODB objects in memory can't be locked so you can't have two threads access the same object at the same time so we are stuck with a single thread per ZODB cache at the moment. @jimfulton does talk about his ideas to change this in his presentation. Please watch his presentation. It explains all this

http://jimfulton.info/talks/plone-2018/#/step-23

Video here https://2018.ploneconf.org/talks/day2-keynote-by-jim-fulton

I think the copy on write idea sounds really promising. The overhead of the ZODB cache is the biggest headache with scaling plone you need to avoid calling any external api in your code or any other IO wait situation.

Rotonen · January 14, 2019, 11:15am

It does. I've yet to see a WSGI setup with different resource pools for transient resources like uploads and downloads of files. ZServer has tiered thread pools baked in.

jordic · January 14, 2019, 11:44am

I thought i watched the talk.. anyway we have a deployment of guillotina that is doing really well.. it's not the same (I know) but for our use case makes sense. We can develop using traversals and the rest of nice features o the zca using asyncio.(the single thread model it's also a feature.. like in node..
If you deploy plone with wsgi and gunicorn for exemple) you need to use zeo or relstorage if you plan to have more than one process (workers on gunicorn language) for the rest wsgi is mor or less the same semantics as zserver with a thread pool (but you loose all the related things around clockserver) that you can gain again using aiohttp.

jordic · January 14, 2019, 11:51am

mmm.. you also need to take care of the timeouts (zserver doesn't timeout by default.. a design decission around the transaction machinery.. thought gunicorn timeouts and if you relay on this feature for l9ng running requests you need to increase it..). Waitress from pylons seems like a child of zserver..

jimfulton · January 15, 2019, 1:04am

Yay WSGI. Honestly, I thought that had happened years ago.

Whether threads or greenlets or coroutines, you'd have a connection per logical thread of control and each thread has its own cache. If you have lots of logical threads (threads, greenlets, coroutines), you'll want smaller caches. At PloneConf I mentioned an idea for letting objects in separate caches share copy-on-write state.

ZServer/zope.server/waitress use an architecture that's fairly common, at least in the Java world. Use an async library for I/O and a thread pool for application logic. IDK other WSGI servers provide this. It wouldn't be too hard to put a thread pool behind an async server.

Speaking of async libraries, my favorite is still gevent. Jason Madden runs a Zope 3-based (I think) application on gevent without change. Of course, you still need to be aware of how many connections you're using.

Remember, when using sexy aysyncio, computation will block all your I/O.

datakurre · May 12, 2019, 5:57pm

Asyncio is one of my main motivations to go with Twisted ZServer instead of WSGI, but my flavor of asyncio follows the old architecture where it is only used for I/O and scheduling, and ZPublisher call remain fully synchronous.

For me this means that the same ZServer instance can run HTTP/2, WebSocket, ZMQ, AMQP, etc.. servers in the same asyncio loop to simplify integration of Plone with other services.

That said, at least Twisted makes it easy to schedule function calls from worker threads to asynchronous main thread (to be delegated further). To have anything back is another story. I'm probably porting collective.futures and continuing to use that approach myself.

Rotonen · May 13, 2019, 7:55am

You need a new scary name for the new medusa.

jaroel · May 13, 2019, 8:01am

Chrysaor?

jordic · May 20, 2019, 9:07pm

someone could just try the asgi-wsgi adapter.. the main loop is in asyncio and sync endpoints run on a thread pool.. you can have websockets and this kind of stuff easy.. but anyway.. the zodb need to run on the thread pool.. perhaps this is just a clever idea (zodb on a thread pool and the rest asyncio.. ).. that is the way django is thinking on evolving to full async (orm==zodb)