Error when creating a File-like object via the RestAPI for Plone 6

ronc · October 25, 2023, 10:33pm

I have a situation where the user uploads a file on one server that performs some pre-processing with the file itself. The text of the file is examined in order to generate tags that will be incorporated into the file-like object. Once the user has been given the opportunity to include some extra metadata (via a web interface), the resulting title, summary, subjects (tags), and the PDF file are posted to Plone 6 via an API statement:

    plone_response = requests.post('https://texasbusinesslaw.org/content-staging', headers={ 'Accept': 'application/json', 'Content-Type': 'application/json', }, 
                                   json={ '@type': 'Material', 
                                          'title': basic_data['title'], 
                                          'description': basic_data['description'], 
                                          'members_only': basic_data['members_only'], 
                                          'file': file_dict, 
                                          'text': basic_data['text'], 
                                          'subjects': tags_to_tuple(tags=basic_data['tags'])
                                          }, 
                                   auth=('admin', 'secret)
                                  )

The results of plone_response.content is:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 473, in prepare_body
body = complexjson.dumps(json, allow_nan=False)
File "/usr/lib/python3/dist-packages/simplejson/init.py", line 385, in dumps
return cls(
File "/usr/lib/python3/dist-packages/simplejson/encoder.py", line 296, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3/dist-packages/simplejson/encoder.py", line 378, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 10: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ronc/.local/lib/python3.8/site-packages/pywebio/session/threadbased.py", line 86, in main_task
target()
File "materials.py", line 84, in main
plone_response = requests.post('Texas Business Law Section', headers={ 'Accept': 'application/json', 'Content-Type': 'application/json', },
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 117, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 515, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 443, in prepare_request
p.prepare(
File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 321, in prepare
self.prepare_body(data, files, json)
File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 475, in prepare_body
raise InvalidJSONError(ve, request=self)
requests.exceptions.InvalidJSONError: 'utf-8' codec can't decode byte 0xf6 in position 10: invalid start byte

I suspect a problem with utf-8 and non-printable characters. Note, the code that is used to upload the file into memory is:

file_dict = file_upload(label="Step 1: Select the PDF File Containing the Materials: ", multiple=False, accept=['.pdf'], required=True)
    filename = file_dict['filename']
    f = file_dict['content']

Obviously, this would be easier if I were able to handle utf-8 encoding/decoding when reading the PDF file from the file system (there are plenty of examples of that). However, in this case, we're having to deal with what we get via the web interface. Do I have to parse out the file data (from the file_dict dictionary) and then encode (or decode) in some way before I make a new dictionary and send that to Plone? Or have I missed the problem entirely?

yurj · October 26, 2023, 9:40am

ronc:

                                   json={ '@type': 'Material', 
                                          'title': basic_data['title'], 
                                          'description': basic_data['description'], 
                                          'members_only': basic_data['members_only'], 
                                          'file': file_dict, 
                                          'text': basic_data['text'], 
                                          'subjects': tags_to_tuple(tags=basic_data['tags'])
                                          },

Did you try to load this as json ( eg json.loads())? I think the problem is in the json, not the pdf file.

ronc · October 26, 2023, 7:09pm

json.loads didn't work, but I tried json.dumps and that seemed to make some headway. I just have a new error message:

Traceback (most recent call last):
  File "/home/ronc/.local/lib/python3.8/site-packages/pywebio/session/threadbased.py", line 86, in main_task
    target()
  File "materials.py", line 60, in main
    data = json.dumps({ '@type': 'Material', 'title': basic_data['title'], 'description': basic_data['description'], 'members_only': basic_data['members_only'], 'file': file_dict, 'text': basic_data['text'], 'subjects': tags_to_tuple(tags=basic_data['tags'])})
  File "/usr/lib/python3.8/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bytes is not JSON serializable

... which I presume is talking about the file. Do you know of an example of a File object being created via the RestAPI?

yurj · October 27, 2023, 7:59am

remove one by one and find which entry gives the error TypeError: Object of type bytes is not JSON serializable

Here an example with problems solved:

here the docs:
https://plonerestapi.readthedocs.io/en/docs-httpexamples-cleanup/serialization.html#upload-deserialization

or Plone Documentation:

ronc · November 22, 2023, 7:26pm

Okay, I'm getting an odd deserialization/malformed body error. Here is the posting JSON:

{
    "@type": "Material",
    "title": "Dummy Title",
    "description": "F",
    "members_only": "yes",
    "file": {
        "data": "JVBERi0xLjMKJcTl8uX ... gXSA+PgpzdGFydHhyZWYKMTMwMjMyCiUlRU9GCg==",
        "encoding": "base64",
        "filename": "AI-Based Patent Applications Recent History and the Future Mintz.pdf",
        "content-type": "application/pdf"
    },
    "text": "G",
    "subjects": [
        "National Arti\\ufb01cial Intelligence Initiative Act",
        "Trademark",
        "Patent Infringement",
        "Patent Application",
        "Intellectual Property"
    ]
}

which, incidentally, I checked and is valid JSON (per https://jsonlint.com/). However, The response that I get from Plone is:

b'{\n "message": "'Malformed body'",\n "traceback": [\n "File \"/app/lib/python3.11/site-packages/ZPublisher/WSGIPublisher.py\", line 181, in transaction_pubevents",\n " yield",\n "",\n " File \"/app/lib/python3.11/site-packages/ZPublisher/WSGIPublisher.py\", line 391, in publish_module",\n " response = publish(request, new_mod_info)",\n " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^",\n "",\n " File \"/app/lib/python3.11/site-packages/ZPublisher/WSGIPublisher.py\", line 285, in publish",\n " result = mapply(obj,",\n " ^^^^^^^^^^^",\n "",\n " File \"/app/lib/python3.11/site-packages/ZPublisher/mapply.py\", line 98, in mapply",\n " return debug(object, args, context)",\n " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^",\n "",\n " File \"/app/lib/python3.11/site-packages/ZPublisher/WSGIPublisher.py\", line 68, in call_object",\n " return obj(*args)",\n " ^^^^^^^^^^",\n "",\n " File \"/app/lib/python3.11/site-packages/plone/rest/service.py\", line 21, in __call_",\n " return self.render()",\n " ^^^^^^^^^^^^^",\n "",\n " File \"/app/lib/python3.11/site-packages/plone/restapi/services/init.py\", line 19, in render",\n " content = self.reply()",\n " ^^^^^^^^^^^^",\n "",\n " File \"/app/lib/python3.11/site-packages/plone/restapi/services/content/add.py\", line 34, in reply",\n " data = json_body(self.request)",\n " ^^^^^^^^^^^^^^^^^^^^^^^",\n "",\n " File \"/app/lib/python3.11/site-packages/plone/restapi/deserializer/init.py\", line 15, in json_body",\n " raise DeserializationError(\"Malformed body\")"\n ],\n "type": "DeserializationError"\n}'

Note, in this case, the "Material" object is a Dexterity object that is derived from the File object, but with a few extra properties. I just don't see where the "malformed body" problem is, and neither does my JSON checker.

There is a discussion about this (albeit for Django) at: python - Django Deserialization Error Problem installing Fixture - Stack Overflow. In that posting, they talked about a serialization format (Serializing Django objects | Django documentation | Django). Is there something like that for Plone?

petschki · November 22, 2023, 9:07pm

The DeserializationError is raised when the data is not a dict instance ... see https://github.com/plone/plone.restapi/blob/main/src/plone/restapi/deserializer/__init__.py#L15 ... you could check the content of data there ...