Threading - ZODB.Connection Shouldn't load state for 0x5d6933 when the connection is closed

I have a form that requires a very expensive call to other servers to get more data, when the form is submitted. I don't want to have the page spinning until it's done so I'm trying to create a new thread for that part. It looks roughly like this

thr = threading.Thread(target=foo, args=(api.portal.get(), data), kwargs={})
thr.start()

def foo(portal, data):
    # get a bunch of data here
    catalog = api.portal.get_tool('portal_catalog')
    catalog.uniqueValuesFor('someindex')
    ....

The problem is that call to uniqueValuesFor yields an error. Snippet:

  File "c:\plone5\eggs\products.zcatalog-3.0.2-py2.7.egg\products\ZCatalog\ZCatalog.py", line 513, in uniqueValuesFor
    return self._catalog.uniqueValuesFor(name)
  File "c:\plone5\eggs\products.zcatalog-3.0.2-py2.7.egg\products\ZCatalog\Catalog.py", line 402, in uniqueValuesFor
    return tuple(self.getIndex(name).uniqueValues())
  File "c:\plone5\eggs\products.zcatalog-3.0.2-py2.7.egg\products\PluginIndexes\common\UnIndex.py", line 503, in uniqueValues
    for key in self._index.keys():
  File "c:\plone5\eggs\zodb3-3.10.5-py2.7-win-amd64.egg\ZODB\Connection.py", line 857, in setstate
    raise ConnectionStateError(msg)
ConnectionStateError: Shouldn't load state for 0x5d6933 when the connection is closed

It doesn't look like an error with the code per se, but with the db connection pool and threading. Some other actions like calling objectValues() would also result in an error. Is threading just a bad idea here or is there something I can do about the connection state?

This is on Plone 5.0.5 on a simple local build using ZODB and a single Zope client

You must not pass a persistent object across thread boundaries. The ZODB relies on each thread having its own local copy of a persistent object: if you pass a persistent object across a thread boundary, then the destination thread uses the copy of the source thread and this copy can change in unexpected ways (as it is under external control). The ConnectionStateError is one of the potential results thereof.

To provide a foreign thread with a persistent object, this thread must load the persistent object itself from the ZODB (via its own ZODB connection). You could have a look at dm.zodb.asynchronous. It contains tools helpful in the context of doing work in a separate thread. One of those tools is a "persistent context" which allows you to pass on persistent objects across thread boundaries. The context does not pass on the object itself but remembers its database and its "oid" and in the destination thread opens a connection to this database (if necessary) and loads the object anew via its "oid". Note that the resulting object is not acquisition wrapped; if the acquisition context is important, then you must pass on the access path and recreate the proper acquisition wrapper by traversing this path.

That is interesting. So when the transaction with the form handle commits, it starts this asynchronous process. It looks like in addition to recreating acquisition I also need to recreate the security manager? I think I can do that just by also passing the user id. Here's what I have

from dm.zodb.asynchronous.zope2 import PersistentContext, transactional
from dm.zodb.asynchronous.scheduler import TransactionalScheduler
from plone import api

def set_site(context, portal_path, user_id):
    app = context['app']
    portal = app.restrictedTraverse(portal_path)  # get acquisition

    setSite(portal)
    admin = api.user.get(user_id)
    newSecurityManager(None, admin)


@transactional
def async_import(context, portal_path, user_id, data, pmid_to_reg):
    set_site(context, portal_path, user_id)
    # do stuff here

And the form handler does

scheduler = TransactionalScheduler()
portal_path = '/'.join(api.portal.get().getPhysicalPath())
scheduler.schedule(async_import, PersistentContext(app=self.context.unrestrictedTraverse('/')), portal_path,
                               api.user.get_current().getId(), data, pmid_to_reg)

For my purposes I think I would also need to pass on some request info as well, because portal.absolute_url() is just coming in as == portal.getId()

This depends on what your thread actually wants to do.

Moving to a separate thread, you lose the complete request context: the request object itself, its user, its site context, its "layer", is access path, ...

Depending on the thread's task, part of this context must be recreated. Your set_site recreates the acquisition context (note: you likely should use unrestrictedTraverse rather than restrictedTraverse -- as there is no user yet), part of the site context. There still is not yet a request, no layer, ... -- which might or might not be necessary depending on the thread's task.

Resurrecting this. My original requirements changed to no longer need this, but I find myself needing something similar in a new project.

My issue is that the request module is needed by some unrelated event handlers called by the actions my thread is doing. Is there an appropriate way to recreate the REQUEST object?

I used https://github.com/collective/collective.taskqueue to do async stuff in Plone land, if my memory serves me well.
Maybe this would work for you?

Definitely looks like worth looking into

By default, taskqueue.add copies headers from the current requests to the asynchronous request. That should be enough to authenticate the requests in exactly the the same way as the current request was authenticated.

Would this include form data? My particular use case here is a user uploads a file that contains several records of publication data. I won't get into the details but this can take several minutes to complete in a way that seems unavoidable (a bottleneck is PubMed requests) In some cases they get a gateway timeout, think the upload has failed, and reupload. I'm trying to offload this to a separate thread that will email them the result, while the original thread simply returns a message informing them the upload was successful and in progress.

It shouldn't be a problem to create another view registered to ITaskQueueLayer layer assuming form data is passed, or there is a way to do that. I will try

If you compute the dynamic bits (ie obj.absolute_url()) up front (on submit that is), then you can fire-and-forget a thread to upload the data outside the current request. No need for fancy things here, I think.

It is the request headers only, not the form data.
IOW, you'll have to save the data to some database first. If you're going to use redis as taskqueue, then you could use the same redis installation to do that as well.

If you just need to defer uploading the data, then you could fire-and-forget a seperate thread to do just that. Remember that when the current transaction fails, you'll upload the document at least twice because a new thread will be started.

I tried to just simply fire and forget a new thread originally, but as Dieter points out the new thread loses the connection state.

I'm thinking the sanest approach is probably not to try to do anything server side at all. Just add some JS so that the form submits asynchronously and a message is displayed, and leave the server side as is.

Yes, you'll need to prepare the data so the new thread only needs to push data further.
A JS solution might be easier indeed