Mimetype registry error with Python > 3.11

In Python 3.11 there was a change in the accepted patterns for regular expressions.

In re Regular Expression Syntax, global inline flags (e.g. (?i)) can now only be used at the start of regular expressions. Using them elsewhere has been deprecated since Python 3.6. (Contributed by Serhiy Storchaka in bpo-47066.)

See: What’s New In Python 3.11 — Python 3.13.1 documentation

This made loading some globs in the mimetypes registry impossible.

This is what I have in Python 3.8:

>>> mtr.globs["AUTHORS"]
(re.compile('AUTHORS\\Z(?ms)', re.MULTILINE|re.DOTALL), <mimetype text/x-authors>)

While in Python 3.11 the pickled object cannot be loaded:

>>> mtr.globs["AUTHORS"]
ERROR:ZODB.Connection:Couldn't load state for BTrees.OOBTree.OOBucket 0x0179d4
...
re.error: global flags not at the start of the expression at position 9

On a more recent site, I see a different pattern:

>>>  mtr.globs["AUTHORS"]
(re.compile('(?s:AUTHORS)\\Z'), <mimetype text/x-authors>)

The code responsible to create the pattern is this one:

(Pdb++) l
 141         @security.protected(ManagePortal)                                                                                                                                                                                                                                                                                                            
 142         def register_glob(self, glob, mimetype):                                                                                                                                                                                                                                                                                                     
 143             """Associate a glob to a IMimetype                                                                                                                                                                                                                                                                                                       
 144                                                                                                                                                                                                                                                                                                                                                      
 145             glob is a shell-like glob that will be translated to a regex                                                                                                                                                                                                                                                                             
 146             to match against whole filename.                                                                                                                                                                                                                                                                                                         
 147             mimetype must implement IMimetype.                                                                                                                                                                                                                                                                                                       
 148             """                                                                                                                                                                                                                                                                                                                                      
 149             globs = getattr(self, "globs", None)                                                                                                                                                                                                                                                                                                     
 150             if globs is None:                                                                                                                                                                                                                                                                                                                        
 151                 self.globs = globs = OOBTree()                                                                                                                                                                                                                                                                                                       
 152             mimetype = aq_base(mimetype)                                                                                                                                                                                                                                                                                                             
 153             existing = globs.get(glob)                                                                                                                                                                                                                                                                                                               
 154             if existing is not None:                                                                                                                                                                                                                                                                                                                 
 155                 regex, mt = existing                                                                                                                                                                                                                                                                                                                 
 156                 if mt != mimetype:                                                                                                                                                                                                                                                                                                                   
 157                     logger.warning(f"Redefining glob {glob} from {mt} to {mimetype}")                                                                                                                                                                                                                                                                
 158             # we don't validate fmt yet, but its ["txt", "html"]                                                                                                                                                                                                                                                                                     
 159             pattern = re.compile(fnmatch.translate(glob))                                                                                                                                                                                                                                                                                            
 160             breakpoint()                                                                                                                                                                                                                                                                                                                             
 161  ->         globs[glob] = (pattern, mimetype)                                                                                                                                                                                                                                                                                                        
(Pdb++) glob
'*.a26'
(Pdb++) fnmatch.translate(glob)
'(?s:.*\\.a26)\\Z'

This yields a different result between Python2.7 and Python3:

[ale@flo ~]$ pyenv shell 2.7; python -c 'import fnmatch; print(fnmatch.translate("foo"))'
foo\Z(?ms)
[ale@flo ~]$ pyenv shell 3.6; python -c 'import fnmatch; print(fnmatch.translate("foo"))'
(?s:foo)\Z

The site was created with Python 2.7 (or maybe even before).

Did I miss an upgrade step?

I do not think so, because I have the same issue in a totally different site which was created with Python 2.7.

Did anyone have the same issue?

1 Like

I came up with this instance script (that has to run with Python < 3.11):

import fnmatch
import logging
import transaction


logger = logging.getLogger("fix_mtr_re")
logger.setLevel(logging.INFO)

sites = app.objectValues("Plone Site")  # type: ignore

for site in sites:
    logger.info("Checking site %r", site)
    registry = site.mimetypes_registry
    globs = registry.globs

    for glob in globs:
        compiled_pattern, mimetype = globs[glob]
        # Since 2009 the fnmatch.translate, in Python 2,
        # returned a string ending with `\Z(?ms)`
        # See https://github.com/python/cpython/commit/b98d6b2cbcba1344609a60c7c0fb9f595d19023b
        if compiled_pattern.pattern.endswith(r"\Z(?ms)"):
            logger.info("  ⚡ Re-registering glob %r (%r -> %r)", glob, compiled_pattern.pattern, fnmatch.translate(glob))
            registry.register_glob(glob, mimetype)
        else:
            logger.info("  ✅ Not touching glob %r (%r)", glob, compiled_pattern.pattern)

transaction.commit()
3 Likes

@alert I have a Plone site where I indeed get a traceback when I start bin/instance debug and access app.Plone.mimetypes_registry.globs["AUTHORS"]. But what would I do to trigger a traceback in the Plone UI? I tried uploading a file called AUTHORS, but that gave no error.

I have custom code that calls Products.MimetypesRegistry.MimeTypesRegistry.globFilename

I am not sure this is used in vanilla Plone.

I made a couple of PRs for plone.app.upgrade to fix this: