Waitress and "Umlauts" with injected custom headers

My Configuration for a test: Shibboleth -> Apache -> Plone5-py3
My Apache Config to inject custom headers for a test (it simulate the shibboleth enviroment):

....
  RewriteEngine On
  RequestHeader set X-DISPLAYNAME "Jörg Müller"
  RequestHeader set X-MAIL "jm@test.local"
  RequestHeader set X-EPPN "muellerj"
  RequestHeader set X-UNSCOPED-AFFILIATION "affiliate,member"
  RewriteCond %{REQUEST_URI} !^/(shibboleth-(sp|idp)|Shibboleth.sso|SAML)
  RewriteRule ^/mysite(.*) http://127.0.0.1:10085/VirtualHostBase/http/mysite.local:80/mysite/VirtualHostRoot/_vh_mysite/$1 [L,P]
....

if i dump the request in a view, i get the following:

{
....
 'HTTP_X_DISPLAYNAME': 'Jörg Müller',
 'HTTP_X_EPPN': 'stuebnerj',
 'HTTP_X_FORWARDED_FOR': '127.0.0.1',
 'HTTP_X_FORWARDED_HOST': 'mysite.local',
 'HTTP_X_FORWARDED_SERVER': 'mysite.local',
 'HTTP_X_MAIL': 'jm@test.local',
 'HTTP_X_THEME_ENABLED': True,
 'HTTP_X_UNSCOPED_AFFILIATION': 'affiliate,member', 
 'wsgi.errors': <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>,
 'wsgi.file_wrapper': <class 'waitress.buffers.ReadOnlyFileBasedBuffer'>,
 'wsgi.input': <_io.BytesIO object at 0x7f0443b71308>,
 'wsgi.input_terminated': True,
 'wsgi.multiprocess': False,
 'wsgi.multithread': True,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)
....
}
displayname = self.request.get_header("HTTP_X_DISPLAYNAME")
print(type(displayname), displayname)
>> <class 'str'> Jörg Müller

My Question, is waitress configurable to protect the encoding? Or is this a Bug? Or is something else wrong? Any Ideas?

I found the a solution.

self.request.get_header("HTTP_X_DISPLAYNAME").encode('latin-1').decode('utf-8')

afaik: header values should be ascii only. If you need non-ascii values, encoding according to RFC 2047

But in the live system, the Shibboleth SP send this stuff. :unamused:

:flushed::face_with_monocle::weary:

Looking at the WSGI implementation in Zope, there's a comment about the wsgi standard requiring latin-1.

@1letter according to Encoding and shibboleth · Issue #77 · rdmorganiser/rdmo · GitHub and https://shibboleth.atlassian.net/wiki/spaces/SHIB2/pages/2577072198/NativeSPContentSettings
you could try with:

ShibRequestSetting encoding URL

I got the same issue for Shibboleth SSO with waitress server.
This is a very complicated problem.

I think our situation is here:

  1. HTTP header should be ASCII or base64 encoded.
  2. Shibboleth SP supplies raw text for UTF-8.
  3. The Waitress server decodes to Unicode (Python3 str) by "latin-1" *1

We will make a workaround this:

request.environ.get("HTTP_X_DISPLAYNAME").encode('latin-1').decode('utf-8')

*1: waitress/parser.py at 9e0b8c801e4d505c2ffc91b891af4ba48af715e0 · Pylons/waitress · GitHub

I mistook the details before entering.
I try to add a shib setting, the below:

<LocationMatch />
    ...
    ShibRequestSetting encoding URL
</LocationMatch>

We got a multi-byte text by URLEncode. We can make an unquote function. It is perfect functionality.