We encountered the exact same issue on Plone 5.2.11 on Python 2.7.18 and figured out the following:
The issue happens in Paste-3.5.2-py2.7.egg/paste/translogger.py
at line 99:
def write_log(self, environ, method, req_uri, start, status, bytes):
if bytes is None:
bytes = '-'
if time.daylight:
offset = time.altzone / 60 / 60 * -100
else:
offset = time.timezone / 60 / 60 * -100
if offset >= 0:
offset = "+%0.4d" % (offset)
elif offset < 0:
offset = "%0.4d" % (offset)
remote_addr = '-'
if environ.get('HTTP_X_FORWARDED_FOR'):
remote_addr = environ['HTTP_X_FORWARDED_FOR']
elif environ.get('REMOTE_ADDR'):
remote_addr = environ['REMOTE_ADDR']
d = {
'REMOTE_ADDR': remote_addr,
'REMOTE_USER': environ.get('REMOTE_USER') or '-',
'REQUEST_METHOD': method,
'REQUEST_URI': req_uri,
'HTTP_VERSION': environ.get('SERVER_PROTOCOL'),
'time': time.strftime('%d/%b/%Y:%H:%M:%S ', start) + offset,
'status': status.split(None, 1)[0],
'bytes': bytes,
'HTTP_REFERER': environ.get('HTTP_REFERER', '-'),
'HTTP_USER_AGENT': environ.get('HTTP_USER_AGENT', '-'),
}
-> message = self.format % d
self.logger.log(self.logging_level, message)
In Germany, it happens precisely only in march, as it is the only month that contains an unicode character ä
, but more on that in a moment.
The contents of d
is the following for me:
{'HTTP_REFERER': '-',
'HTTP_USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
'HTTP_VERSION': 'HTTP/1.1',
'REMOTE_ADDR': '127.0.0.1',
'REMOTE_USER': u'admin',
'REQUEST_METHOD': 'GET',
'REQUEST_URI': '/',
'bytes': '3222',
'status': '200',
'time': '11/M\xc3\xa4r/2023:11:48:46 +0200'}
and self.format
is this:
'%(REMOTE_ADDR)s - %(REMOTE_USER)s [%(time)s] "%(REQUEST_METHOD)s %(REQUEST_URI)s %(HTTP_VERSION)s" %(status)s %(bytes)s "%(HTTP_REFERER)s" "%(HTTP_USER_AGENT)s"'
Now the funny part. While this works:
>>> '%s' % time.strftime('%d/%b/%Y:%H:%M:%S ', start) + offset
'11/M\xc3\xa4r/2023:11:48:46 +0200'
this does not:
'%(REMOTE_ADDR)s - %(REMOTE_USER)s [%(time)s]' % {'REMOTE_ADDR': remote_addr, 'REMOTE_USER': environ.get('REMOTE_USER') or '-', 'time': time.strftime('%d/%b/%Y:%H:%M:%S ', start) + offset,}
*** UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
So the thing that makes it fail is in this case the REMOTE_USER
that comes in as a unicode string:
>>> environ.get('REMOTE_USER')
u'admin'
This seems to convert the whole thing into unicode and causes the error.
Once again:
>>> '%(REMOTE_ADDR)s - %(REMOTE_USER)s [%(time)s]' % {'REMOTE_ADDR': remote_addr, 'REMOTE_USER': u'admin', 'time': time.strftime('%d/%b/%Y:%H:%M:%S ', start) + offset,}
*** UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
and here the user as a string:
'%(REMOTE_ADDR)s - %(REMOTE_USER)s [%(time)s]' % {'REMOTE_ADDR': remote_addr, 'REMOTE_USER': 'admin', 'time': time.strftime('%d/%b/%Y:%H:%M:%S ', start) + offset,}
'127.0.0.1 - admin [11/M\xc3\xa4r/2023:11:48:46 +0200]'
Because of the the time being encoded already as UTF8
, this makes the whole thing break in March if you are in Germany März
or in Barcelona (Catalan) març
.
On the other 11 months it does not happen...
However, what we noted is that the user seems to not always come in as a unicode string. But we did not yet debug it into detail.
Hopefully we find it out and create a PR for it.
Cheers, Ramon