Catalog rebuild show errors after Migration to Py3

If i rebuild the catalog, the following logentries are shown. Has anyone ever seen the error after a migration? That only happens after the migration. Is it possible that a lib is missing for indexing the PDF?

2020-06-15 14:20:29,346 INFO    [Plone:533][waitress] Catalog Rebuilt
Total time: 3.122103691101074
Total CPU time: 2.98133
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Error: Unknown character collection 'PDFXC30-Identity'
Syntax Warning: Invalid Font Weight
Syntax Warning: Invalid Font Weight
Syntax Warning: Invalid Font Weight
Syntax Warning: Invalid Font Weight

Likely related to the indexing of PDF files. If you can, try to find a related PDF content and convert it manually to TXT. Isn't the current PDF to TXT conversion accomplished by pdftotext or pdftohtml (poppler-utils)?

1 Like

Slightly off topic, but the warning about font weight might indicate that the wrong font is being used.

Font weight is what "used to be" normal/regular, bold, thin, extra bold etc.

These days (especially on the web), they are often referenced as 100, 200, 300… etc, where 100 is 'very thin', 900 is very bold (heavy).

A wild guess: The 'original font' is not presents, so when a character is referenced and another font is used, it can not 'reference/use' that character (since it does not exist in that font).