Upload a PDF file with restapi

Here is an example on how to upload a file using plone.restapi:

  {
    "@type": "File",
    "title": "My file",
    "file": {
        "data": "TG9yZW0gSXBzdW0uCg==",
        "encoding": "base64",
        "filename": "lorem.txt",
        "content-type": "text/plain"}
  }

The problem is, the base64 encoded contents ("data" above) of a PDF file (even a reasonable one) is a veeeeeery long string.

How do you deal with the upload of such a file using plone.restapi?

Even a huge string might not be a problem.

You try it out (with the huge string) and come back (with a good problem description) in case you should observe a problem.

What is your real problem?

We use plone.restapi for importing files of 100MB and more during migrations...no issues at all.

1 Like

I took you at your words :grinning:...

... but it doesn't work (okay the huge string is 2230876 characters long):

{
  "message": "'No JSON object could be decoded'",
  "traceback": [
    "File \"/home/bitouze/test-plone-5-2-python3/buildout-cache/eggs/Zope-4.1.1-py3.7.egg/ZPublisher/WSGIPublisher.py\", line 155, in transaction_pubevents",
    "    yield",
    "",
    "  File \"/home/bitouze/test-plone-5-2-python3/buildout-cache/eggs/Zope-4.1.1-py3.7.egg/ZPublisher/WSGIPublisher.py\", line 337, in publish_module",
    "    response = _publish(request, new_mod_info)",
    "",
    "  File \"/home/bitouze/test-plone-5-2-python3/buildout-cache/eggs/Zope-4.1.1-py3.7.egg/ZPublisher/WSGIPublisher.py\", line 255, in publish",
    "    bind=1)",
    "",
    "  File \"/home/bitouze/test-plone-5-2-python3/buildout-cache/eggs/Zope-4.1.1-py3.7.egg/ZPublisher/mapply.py\", line 85, in mapply",
    "    return debug(object, args, context)",
    "",
    "  File \"/home/bitouze/test-plone-5-2-python3/buildout-cache/eggs/Zope-4.1.1-py3.7.egg/ZPublisher/WSGIPublisher.py\", line 61, in call_object",
    "    return obj(*args)",
    "",
    "  File \"/home/bitouze/test-plone-5-2-python3/buildout-cache/eggs/plone.rest-1.4.0-py3.7.egg/plone/rest/service.py\", line 23, in __call__",
    "    return self.render()",
    "",
    "  File \"/home/bitouze/test-plone-5-2-python3/buildout-cache/eggs/plone.restapi-4.3.1-py3.7.egg/plone/restapi/services/__init__.py\", line 21, in render",
    "    content = self.reply()",
    "",
    "  File \"/home/bitouze/test-plone-5-2-python3/buildout-cache/eggs/plone.restapi-4.3.1-py3.7.egg/plone/restapi/services/content/add.py\", line 27, in reply",
    "    data = json_body(self.request)",
    "",
    "  File \"/home/bitouze/test-plone-5-2-python3/buildout-cache/eggs/plone.restapi-4.3.1-py3.7.egg/plone/restapi/deserializer/__init__.py\", line 11, in json_body",
    "    raise DeserializationError(\"No JSON object could be decoded\")"
  ],
  "type": "DeserializationError"
}

See above. And do you manually copy-paste the huge strings?

As said: no issue found with 50k files and some with a size of several hundred MB

Regarding your problem: file a reproducible bug report

The file is this one (1,6MB) and the JSON code is the following one:

{
    "@type": "File",
    "title": "My file",
    "file": {
        "encoding": "base64",
	    "data": "⟨the huge string⟩",
	    "filename": "test.pdf",
        "content-type": "application/pdf"
    }
}

This works nicely for a smaller (10,5 KB) PDF test file, but not with the real one above.

The error message indicates that your JSON serialization was faulty. It has nothing to do with the size of the uploaded file.

I recommend to first construct a Python dict with the necessary content and then use json.dumps to produce the JSON serialization.

1 Like

Here is the Python script I use to construct the dict:

from base64 import b64encode
from json import dumps

ENCODING = 'utf-8'
FILE_NAME = 'en-ligne1.pdf'
JSON_NAME = 'output.json'

# first: reading the binary stuff
# note the 'rb' flag
# result: bytes
with open(FILE_NAME, 'rb') as open_file:
    byte_content = open_file.read()

# second: base64 encode read data
# result: bytes (again)
base64_bytes = b64encode(byte_content)

# third: decode these bytes to text
# result: string (in utf-8)
base64_string = base64_bytes.decode(ENCODING)

# optional: doing stuff with the data
# result here: some dict
raw_data = {
    "@type": "File",
    "title": "My file",
    "file": {
        "encoding": "base64",
        "data": base64_string,
        "filename": "test.pdf",
        "content-type": "application/pdf"
    }
}

# now: encoding the data to json
# result: string
json_data = dumps(raw_data, indent=2)

# finally: writing the json string to disk
# note the 'w' flag, no 'b' needed as we deal with text here
with open(JSON_NAME, 'w') as another_open_file:
    another_open_file.write(json_data)

The compilation works smoothly and I can see the content of the output.json output file thanks to cat but, because it is a too large file to be open and its content copied, I copy the content to the clipboard thanks to:

$ cat output.json | xsel -ib

But when I paste the clipboard content to the Postman's Body/raw window, nothing appears.

Please provide a standalone example Python script or curl command.

Are you sure that your HTTP headers are set properly?

You can use https://plone-demo.info (admin/admin) as a backend for testing.

We need a reproducible testcase...

If you work with Postman then you are able to export the related request through "code" or a Postman snippet in order for trying this outselves.

The script looks good. Use json_data as "body" of your HTTP-POST request (use "POST" rather than "GET" as "GET" requests typically have a size limitation).

Slightly off topic since a few coworkers already also run into this. It would be nice to have more examples on how to upload files/images in the plone.restapi docs. Any takers?

OK, I succeeded by adding to the previous Python script a request of the form:

requests.post('http://nohost/plone/folder', headers={ 'Accept': 'application/json', 'Content-Type': 'application/json', }, json=raw_data, auth=('admin', 'secret'))

Thanks to everybody!

I fully agree! :slight_smile:

And, a very nice addition would be on how to update files/images content: BTW, this requires TUS resumable upload, isn't it?

Never mind! I missed "Updating a Resource with PATCH".

FYI: TUS is optional and not a requirement to upload files/images. Though, I agree it would be nice to have examples for that as well.