Sessions and POSKeyErrors

I am creating a PAS plugin for a new Plone 5.1 site, which would pass username and password on to a different server for authentication. It stores a few items in the session, for now a session id from the auth server, and a full name.

I now run into problems when reading from the session. At first, all is fine, but when I go to a page with an error, for example simply a non-existing page triggering a 404 NotFound, it starts going wrong. Subsequent calls to normal pages, even just a javascript, give an exception:

2018-03-02T16:07:51 ERROR client.myplugin KeyError getting SESSION for request http://127.0.0.1:8080/Plone/++resource++protect.js
Traceback (most recent call last):
  File ".../myplugin/plugins.py", line 31, in get_session
    return request.SESSION
  File "/Users/maurits/shared-eggs/cp27m/Zope2-2.13.26-py2.7.egg/ZPublisher/HTTPRequest.py", line 1380, in __getattr__
    v = self.get(key, default, returnTaints=returnTaints)
  File "/Users/maurits/shared-eggs/cp27m/Zope2-2.13.26-py2.7.egg/ZPublisher/HTTPRequest.py", line 1337, in get
    v = v()
  File "/Users/maurits/shared-eggs/cp27m/Zope2-2.13.26-py2.7.egg/Products/Sessions/SessionDataManager.py", line 101, in getSessionData
    return self._getSessionDataObject(key)
  File "/Users/maurits/shared-eggs/cp27m/Zope2-2.13.26-py2.7.egg/Products/Sessions/SessionDataManager.py", line 188, in _getSessionDataObject
    ob = container.new_or_existing(key)
  File "/Users/maurits/shared-eggs/cp27m/Zope2-2.13.26-py2.7.egg/Products/Transience/Transience.py", line 839, in new_or_existing
    item = self.get(key, _marker)
  File "/Users/maurits/shared-eggs/cp27m/Zope2-2.13.26-py2.7.egg/Products/Transience/Transience.py", line 495, in get
    item = self._move_item(k, current_ts, default)
  File "/Users/maurits/shared-eggs/cp27m/Zope2-2.13.26-py2.7.egg/Products/Transience/Transience.py", line 314, in _move_item
    if abucket.get(k, None) is not None:
  File "/Users/maurits/shared-eggs/cp27m/ZODB-5.3.0-py2.7.egg/ZODB/Connection.py", line 796, in setstate
    p, serial = self._storage.load(oid)
  File "/Users/maurits/shared-eggs/cp27m/ZODB-5.3.0-py2.7.egg/ZODB/mvccadapter.py", line 143, in load
    r = self._storage.loadBefore(oid, self._start)
  File "/Users/maurits/shared-eggs/cp27m/tempstorage-4.0.1-py2.7.egg/tempstorage/TemporaryStorage.py", line 187, in loadBefore
    raise POSException.POSKeyError(oid)
POSKeyError: 0x13

In this case I actually catch the error and ignore it, so the visitor would not immediately notice, but she is no longer authenticated.

Sometimes it also just works. I don't know why. Maybe it only fails when a BTree in the session needs to be reshuffled or something.

In the traceback I first got a KeyError, but I updated tempstorage from 4.0 to 4.0.1 to see if that would fix it (see https://github.com/zopefoundation/tempstorage/issues/5), but that just turns the KeyError into a POSKeyError.

Strangely I get other tracebacks in between, which seem not related to any request:

2018-03-02T16:07:51 ERROR ZODB.Connection Couldn't load state for BTrees.OOBTree.OOBTree 0x13
Traceback (most recent call last):
  File "/Users/maurits/shared-eggs/cp27m/ZODB-5.3.0-py2.7.egg/ZODB/Connection.py", line 796, in setstate
    p, serial = self._storage.load(oid)
  File "/Users/maurits/shared-eggs/cp27m/ZODB-5.3.0-py2.7.egg/ZODB/mvccadapter.py", line 143, in load
    r = self._storage.loadBefore(oid, self._start)
  File "/Users/maurits/shared-eggs/cp27m/tempstorage-4.0.1-py2.7.egg/tempstorage/TemporaryStorage.py", line 187, in loadBefore
    raise POSException.POSKeyError(oid)
POSKeyError: 0x13

I created a Python Script that ought to trigger it, even in a fresh Zope database:

print 'setting value in session'
session = context.REQUEST.SESSION
# Same:
# sdm = context.session_data_manager
# session = sdm.getSessionData(create=True)

print session.get('foobar')
session.set('foobar', 'snap')
print session.get('foobar')
# To temporarily trigger an error, uncomment the next line:
# 1/0
return printed

In theory, calling this script once, should work fine. Then visit a non existing page, to trigger a 404 exception, or uncomment the 1/0 for a ZeroDivisionError. Then call the script again, and it should trigger a POSKeyError. I saw that in the Plone coredev buildout too, so without my PAS plugin:

2018-03-02 13:49:03 ERROR Zope.SiteErrorLog 1519994943.60.99781176981 http://127.0.0.1:8080/testsession
Traceback (innermost last):
  Module ZPublisher.Publish, line 138, in publish
  Module ZPublisher.mapply, line 77, in mapply
  Module ZPublisher.Publish, line 48, in call_object
  Module Shared.DC.Scripts.Bindings, line 322, in __call__
  Module Shared.DC.Scripts.Bindings, line 359, in _bindAndExec
  Module Products.PythonScripts.PythonScript, line 344, in _exec
  Module script, line 2, in testsession
   - <PythonScript at /testsession>
   - Line 2
  Module ZPublisher.HTTPRequest, line 1380, in __getattr__
  Module ZPublisher.HTTPRequest, line 1337, in get
  Module Products.Sessions.SessionDataManager, line 101, in getSessionData
  Module Products.Sessions.SessionDataManager, line 188, in _getSessionDataObject
  Module Products.Transience.Transience, line 839, in new_or_existing
  Module Products.Transience.Transience, line 495, in get
  Module Products.Transience.Transience, line 314, in _move_item
  Module ZODB.Connection, line 796, in setstate
  Module ZODB.mvccadapter, line 143, in load
  Module tempstorage.TemporaryStorage, line 187, in loadBefore
POSKeyError: 0x12

Except: it doesn't always trigger. So it's tricky. Maybe it only goes wrong when the same browser is doing too many requests (for css/js in dev mode) in short succession. But once it is triggered, every request that tries to access the session will fail, until you restart Zope.

I couldn't trigger it in Plone 4.3 and 5.0 at all. It could be that this is because these still use older ZODB versions.
In Plone 5.2 I got an AttributeError for request.SESSION, because the session machinery (see http://zope.readthedocs.io/en/latest/zope2book/Sessions.html#default-configuration) is apparently not setup in Zope 4.

Am I using sessions wrongly?

Should I be using plone.session instead? But I don't quite see how I would use that to write something to a session without using the exact same Zope session machinery that is failing me.

I may need to switch to storing some stuff in a cookie instead.

do you know Products.BeakerSessionDataManager? it may help you on your endevour.

I'm not sure about the problem with sessions here, but I would recommend using a (signed) cookie for this so that you don't need to worry about sticky sessions when using multiple ZEO clients.

Doing this webfront-side is still a good idea for cache coherency.

That's a nice one. Since ZODB5 is a major change, I'd ping ZODB devs for pointers.

Btw, this doesn't happen on 5.0 right?

That's right:

I have the same problem here.
My usecase is:

  • zeoclients spread in two servers balanced by a load balancer
  • on every server, zeoinstances are balanced by haproxy that set a cookie for sticky session
  • everytime an anonymous try to access a content in the site, he is redirected to the external service that returns an auth cookie, and redirects the users to the logged_in script with a came_from value.

Randomly i have a PosKeyError on an instance after login raised by tempstorage.
And that instance is broken until i restart it.

@mauritsvanrees did you understand what could cause this problem? Have you fixed it somehow?

Possibly completely unrelated, but this was a question earlier this year also dealing with tempstorage issues?

I don't know if could be related to my problem, but: who sets _ZopeId cookie?
I've two different environments with the same buildout and very similar products, but in one of them i see that after the login, there is an extra _ZopeId cookie in the browser.

After some code-digging i understand where it happens (getBrowserId method) and his stacktrace:

(Pdb) w
eggs/Zope2-2.13.26-py2.7.egg/ZServer/PubCore/ZServerPublisher.py(31)__init__()
-> response=b)
eggs/Zope2-2.13.26-py2.7.egg/ZPublisher/Publish.py(455)publish_module()
-> environ, debug, request, response)
eggs/Zope2-2.13.26-py2.7.egg/ZPublisher/Publish.py(249)publish_module_standard()
-> response = publish(request, module_name, after_list, debug=debug)
eggs/Zope2-2.13.26-py2.7.egg/ZPublisher/Publish.py(138)publish()
-> request, bind=1)
eggs/Zope2-2.13.26-py2.7.egg/ZPublisher/mapply.py(77)mapply()
-> if debug is not None: return debug(object,args,context)
eggs/Zope2-2.13.26-py2.7.egg/ZPublisher/Publish.py(48)call_object()
-> result=apply(object,args) # Type s<cr> to step into published object.
eggs/Products.CMFFormController-3.1.5-py2.7.egg/Products/CMFFormController/FSControllerPageTemplate.py(91)__call__()
-> return self._call(FSControllerPageTemplate.inheritedAttribute('__call__'), *args, **kwargs)
eggs/Products.CMFFormController-3.1.5-py2.7.egg/Products/CMFFormController/BaseControllerPageTemplate.py(29)_call()
-> return self.getNext(controller_state, REQUEST)
eggs/Products.CMFFormController-3.1.5-py2.7.egg/Products/CMFFormController/ControllerBase.py(232)getNext()
-> return next_action.getAction()(controller_state)
eggs/Products.CMFFormController-3.1.5-py2.7.egg/Products/CMFFormController/Actions/TraverseTo.py(38)__call__()
-> REQUEST, bind=1)
eggs/Zope2-2.13.26-py2.7.egg/ZPublisher/mapply.py(77)mapply()
-> if debug is not None: return debug(object,args,context)
eggs/Zope2-2.13.26-py2.7.egg/ZPublisher/Publish.py(48)call_object()
-> result=apply(object,args) # Type s<cr> to step into published object.
eggs/Products.CMFFormController-3.1.5-py2.7.egg/Products/CMFFormController/FSControllerPythonScript.py(104)__call__()
-> result = FSControllerPythonScript.inheritedAttribute('__call__')(self, *args, **kwargs)
eggs/Products.CMFFormController-3.1.5-py2.7.egg/Products/CMFFormController/Script.py(145)__call__()
-> return BaseFSPythonScript.__call__(self, *args, **kw)
eggs/Products.CMFCore-2.2.12-py2.7.egg/Products/CMFCore/FSPythonScript.py(127)__call__()
-> return Script.__call__(self, *args, **kw)
eggs/Zope2-2.13.26-py2.7.egg/Shared/DC/Scripts/Bindings.py(322)__call__()
-> return self._bindAndExec(args, kw, None)
eggs/Zope2-2.13.26-py2.7.egg/Shared/DC/Scripts/Bindings.py(359)_bindAndExec()
-> return self._exec(bound_data, args, kw)
eggs/Products.PythonScripts-2.13.2-py2.7.egg/Products/PythonScripts/PythonScript.py(344)_exec()
-> result = f(*args, **kw)
Script (Python)(8)logged_in()
eggs/Zope2-2.13.26-py2.7.egg/ZPublisher/BaseRequest.py(308)items()
-> result.append((k, get(k)))
eggs/Zope2-2.13.26-py2.7.egg/ZPublisher/HTTPRequest.py(1337)get()
-> v = v()
eggs/Zope2-2.13.26-py2.7.egg/Products/Sessions/SessionDataManager.py(99)getSessionData()
-> key = self.getBrowserIdManager().getBrowserId(create=create)
eggs/Zope2-2.13.26-py2.7.egg/Products/Sessions/BrowserIdManager.py(158)getBrowserId()
-> bid = getattr(REQUEST, 'browser_id_', None)

but i don't know why in one case is used and in the other not. Any suggestions?

The balancer that tries to set a sticky session?

@alert i have that cookie also trying to access a single instance

Some session pas plugin?

Products.Sessions. / zope2.sessioncookie

https://github.com/search?q=org%3Azopefoundation+_zopeid&type=Code

No, I don't understand it, and I haven't fixed it. I worked around it by using cookies to store what I need, which is fine for my use case.

I did report it in several posts to the zodb developers list, and then created an issue based on that: POSKeyError when getting session · Issue #4 · zopefoundation/Products.Sessions · GitHub. Neither of those received any comments.
Summary of my findings:

  • You can set the Z_TOC_DEBUG environment variable (just to 1 I think) to get debug output from Products.Transcience.
  • The error happens in Products.Transience in _do_gc_work.
  • There it goes wrong when calling self. _last_finalized_timeslice(), which is an instance of a very simple Persistent class in Transience.py, where a call simply gets its own value from the database (probably the temporary storage).

When I say 'the error happens in Products.Transience', I really mean: this is where the error surfaces. Maybe this is the correct code part that needs fixing, but maybe the real cause of the error is somewhere else.

Interestingly, about a week ago Jim Fulton opened an issue in the ZODB tracker, calling for "Better POSKeyError handling":

Thanks @fredvd and @mauritsvanrees for these references!

I tried to dig a bit yesterday, without success.

I tried also to follow the traces of who generates _ZopeId cookie, again without any lucky. I mean, i know where this is done, but i don't understand why in some cases this cookie is generated and in other ones he isn't.
In that environment, after the login (i think when Plone sets the REQUEST for the logged_in script), SessionDataManger get called and sets the cookie. In another environment (with similar buildout), not.

I tried to disable the generation of that cookie (in SessionDataManager) and this seems solved the problem..ugly patch but it works

Is the session data perhaps mounted on the zeo server?

As TemporaryStorage is supposed to be RAM only...

which this should do, as adding to self._index wouldn't result in a DB write persée

Even if it is written self. _conflict_cache can be empty if GC has run, which results in the POSKeyError.

I'm just rambling, maybe this contains some truth somewhere.

Yesterday I've been struck by the same issue. I was only able to reproduce it in Plone 5.1.x. In the short term we might switch to using a cookie or beaker as a workaround. But since there are probably a lot of applications that rely on this we should investigate some more. Maybe at the conference, when we can corner Jim?

@cekk In a similar situation, we used collective.beaker but we had a standard session manager object. This caused the same problem: A call to a DTML page (used to signal "UP" to our varnish load balancer) resulted in a Zope session cookie, and a Transient object. This brought our site down after 3000 seconds (the varnish health check interval being one second).
A similar site with collective.beaker which had the proper "beaker session manager" object did not have this problem. The solution to replace the session manager object with a beaker session manager resolved the problem for us.

2 Likes