We encountered the exact same issue on Plone 5.2.11 on Python 2.7.18 and figured out the following:
The issue happens in Paste-3.5.2-py2.7.egg/paste/translogger.py at line 99:
def write_log(self, environ, method, req_uri, start, status, bytes):
if bytes is None:
bytes = '-'
if time.daylight:
offset = time.altzone / 60 / 60 * -100
else:
offset = time.timezone / 60 / 60 * -100
if offset >= 0:
offset = "+%0.4d" % (offset)
elif offset < 0:
offset = "%0.4d" % (offset)
remote_addr = '-'
if environ.get('HTTP_X_FORWARDED_FOR'):
remote_addr = environ['HTTP_X_FORWARDED_FOR']
elif environ.get('REMOTE_ADDR'):
remote_addr = environ['REMOTE_ADDR']
d = {
'REMOTE_ADDR': remote_addr,
'REMOTE_USER': environ.get('REMOTE_USER') or '-',
'REQUEST_METHOD': method,
'REQUEST_URI': req_uri,
'HTTP_VERSION': environ.get('SERVER_PROTOCOL'),
'time': time.strftime('%d/%b/%Y:%H:%M:%S ', start) + offset,
'status': status.split(None, 1)[0],
'bytes': bytes,
'HTTP_REFERER': environ.get('HTTP_REFERER', '-'),
'HTTP_USER_AGENT': environ.get('HTTP_USER_AGENT', '-'),
}
-> message = self.format % d
self.logger.log(self.logging_level, message)
In Germany, it happens precisely only in march, as it is the only month that contains an unicode character ä, but more on that in a moment.
The contents of d is the following for me:
{'HTTP_REFERER': '-',
'HTTP_USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
'HTTP_VERSION': 'HTTP/1.1',
'REMOTE_ADDR': '127.0.0.1',
'REMOTE_USER': u'admin',
'REQUEST_METHOD': 'GET',
'REQUEST_URI': '/',
'bytes': '3222',
'status': '200',
'time': '11/M\xc3\xa4r/2023:11:48:46 +0200'}
and self.format is this:
'%(REMOTE_ADDR)s - %(REMOTE_USER)s [%(time)s] "%(REQUEST_METHOD)s %(REQUEST_URI)s %(HTTP_VERSION)s" %(status)s %(bytes)s "%(HTTP_REFERER)s" "%(HTTP_USER_AGENT)s"'
Now the funny part. While this works:
>>> '%s' % time.strftime('%d/%b/%Y:%H:%M:%S ', start) + offset
'11/M\xc3\xa4r/2023:11:48:46 +0200'
this does not:
'%(REMOTE_ADDR)s - %(REMOTE_USER)s [%(time)s]' % {'REMOTE_ADDR': remote_addr, 'REMOTE_USER': environ.get('REMOTE_USER') or '-', 'time': time.strftime('%d/%b/%Y:%H:%M:%S ', start) + offset,}
*** UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
So the thing that makes it fail is in this case the REMOTE_USER that comes in as a unicode string:
>>> environ.get('REMOTE_USER')
u'admin'
This seems to convert the whole thing into unicode and causes the error.
Once again:
>>> '%(REMOTE_ADDR)s - %(REMOTE_USER)s [%(time)s]' % {'REMOTE_ADDR': remote_addr, 'REMOTE_USER': u'admin', 'time': time.strftime('%d/%b/%Y:%H:%M:%S ', start) + offset,}
*** UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
and here the user as a string:
'%(REMOTE_ADDR)s - %(REMOTE_USER)s [%(time)s]' % {'REMOTE_ADDR': remote_addr, 'REMOTE_USER': 'admin', 'time': time.strftime('%d/%b/%Y:%H:%M:%S ', start) + offset,}
'127.0.0.1 - admin [11/M\xc3\xa4r/2023:11:48:46 +0200]'
Because of the the time being encoded already as UTF8, this makes the whole thing break in March if you are in Germany März or in Barcelona (Catalan) març.
On the other 11 months it does not happen...
However, what we noted is that the user seems to not always come in as a unicode string. But we did not yet debug it into detail.
Hopefully we find it out and create a PR for it.
Cheers, Ramon