I have a situation where the user uploads a file on one server that performs some pre-processing with the file itself. The text of the file is examined in order to generate tags that will be incorporated into the file-like object. Once the user has been given the opportunity to include some extra metadata (via a web interface), the resulting title, summary, subjects (tags), and the PDF file are posted to Plone 6 via an API statement:
plone_response = requests.post('https://texasbusinesslaw.org/content-staging', headers={ 'Accept': 'application/json', 'Content-Type': 'application/json', },
json={ '@type': 'Material',
'title': basic_data['title'],
'description': basic_data['description'],
'members_only': basic_data['members_only'],
'file': file_dict,
'text': basic_data['text'],
'subjects': tags_to_tuple(tags=basic_data['tags'])
},
auth=('admin', 'secret)
)
The results of plone_response.content is:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 473, in prepare_body
body = complexjson.dumps(json, allow_nan=False)
File "/usr/lib/python3/dist-packages/simplejson/init.py", line 385, in dumps
return cls(
File "/usr/lib/python3/dist-packages/simplejson/encoder.py", line 296, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3/dist-packages/simplejson/encoder.py", line 378, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 10: invalid start byteDuring handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ronc/.local/lib/python3.8/site-packages/pywebio/session/threadbased.py", line 86, in main_task
target()
File "materials.py", line 84, in main
plone_response = requests.post('Texas Business Law Section', headers={ 'Accept': 'application/json', 'Content-Type': 'application/json', },
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 117, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 515, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 443, in prepare_request
p.prepare(
File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 321, in prepare
self.prepare_body(data, files, json)
File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 475, in prepare_body
raise InvalidJSONError(ve, request=self)
requests.exceptions.InvalidJSONError: 'utf-8' codec can't decode byte 0xf6 in position 10: invalid start byte
I suspect a problem with utf-8 and non-printable characters. Note, the code that is used to upload the file into memory is:
file_dict = file_upload(label="Step 1: Select the PDF File Containing the Materials: ", multiple=False, accept=['.pdf'], required=True)
filename = file_dict['filename']
f = file_dict['content']
Obviously, this would be easier if I were able to handle utf-8 encoding/decoding when reading the PDF file from the file system (there are plenty of examples of that). However, in this case, we're having to deal with what we get via the web interface. Do I have to parse out the file data (from the file_dict dictionary) and then encode (or decode) in some way before I make a new dictionary and send that to Plone? Or have I missed the problem entirely?