Finding the problem caused by a <pre> tag in a RichText (text) field

I have a custom Dexterity content type. It has two RichText fields (text and one called metadata). I created content using the plone api, and many of the objects work correctly (in that the fields all had proper data and all are editable). However. for some of the created objects, the content for the RichText (text) field is ascii text with

 and 
html tags tacked on to the beginning and end of the text. (NOTEZ BIEN: that field is populated with nothing but ascii text). The only "potentially" odd character is a form feed (\f) character. Even so, when the object is created, the object for the text data is set to <type 'unicode'>. For those objects, when I edit the object, this is the error message that I get the following error message:

Here is the full error message:

Display traceback as text

Traceback (innermost last):

Module ZPublisher.Publish, line 138, in publish
Module ZPublisher.mapply, line 77, in mapply
Module ZPublisher.Publish, line 48, in call_object
Module plone.z3cform.layout, line 63, in __call__
Module plone.z3cform.layout, line 57, in update
Module z3c.form.form, line 162, in render
Module zope.browserpage.viewpagetemplatefile, line 49, in __call__
Module zope.pagetemplate.pagetemplate, line 137, in pt_render
Module five.pt.engine, line 98, in __call__
Module z3c.pt.pagetemplate, line 163, in render
Module chameleon.zpt.template, line 261, in render
Module chameleon.template, line 171, in render
Module 8d1f47e2365fac4ee588885a40545d90.py, line 91, in render
Module 35d37b17cb92bd6236762edd00e240ad.py, line 1826, in render_titlelessform
Module 35d37b17cb92bd6236762edd00e240ad.py, line 451, in render_fields
Module 35d37b17cb92bd6236762edd00e240ad.py, line 126, in render_widget_rendering
Module 35d37b17cb92bd6236762edd00e240ad.py, line 1069, in render_field
Module five.pt.expressions, line 161, in __call__
Module Products.Five.browser.metaconfigure, line 485, in __call__
Module zope.browserpage.viewpagetemplatefile, line 81, in __call__
Module zope.browserpage.viewpagetemplatefile, line 49, in __call__
Module zope.pagetemplate.pagetemplate, line 137, in pt_render
Module five.pt.engine, line 98, in __call__
Module z3c.pt.pagetemplate, line 163, in render
Module chameleon.zpt.template, line 261, in render
Module chameleon.template, line 191, in render
Module chameleon.template, line 171, in render
Module 3728de77e9e3f669b09acd93459dd5ea.py, line 610, in render
Module 3728de77e9e3f669b09acd93459dd5ea.py, line 481, in render_widget_wrapper
Module five.pt.expressions, line 161, in __call__
Module plone.app.z3cform.widget, line 646, in render
Module plone.app.widgets.base, line 309, in __init__
Module plone.app.widgets.base, line 323, in _set_value
Module lxml.etree, line 1033, in lxml.etree._Element.text.__set__ (src/lxml/etree.c:55089)
Module lxml.etree, line 716, in lxml.etree._setNodeText (src/lxml/etree.c:25876)
Module lxml.etree, line 704, in lxml.etree._createTextNode (src/lxml/etree.c:25739)
Module lxml.etree, line 1444, in lxml.etree._utf8 (src/lxml/etree.c:32958)

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

  • Expression: "widget/@@ploneform-render-widget"
  • Filename: ... rm-3.0.4-py2.7.egg/plone/app/z3cform/templates/macros.pt
  • Location: (line 100: col 81)
  • Source: ... place="structure widget/@@ploneform-render-widget"/>
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  • Expression: "widget/render"
  • Filename: ... rm-3.0.4-py2.7.egg/plone/app/z3cform/templates/widget.pt
  • Location: (line 39: col 46)
  • Source: ... xt" tal:replace="structure widget/render"
    ^^^^^^^^^^^^^
  • Arguments: repeat: {...} (0)
    context: <RichTextWidget IRichText.text at 0x7efe1d9741d0>
    views: <ViewMapper - at 0x7efe1d9eed10>
    modules: <TraversableModuleImporter - at 0x7efe28381c90>
    args: <tuple - at 0x7efe30763050>
    nothing: <NoneType - at 0x56450a4de4d0>
    target_language: <NoneType - at 0x56450a4de4d0>
    default: <object - at 0x7efe306b0540>
    request: <instance - at 0x7efe1e28ed40>
    wrapped_repeat: {...} (0)
    loop: {...} (0)
    template: <ViewPageTemplateFile - at 0x7efe1df4ffd0>
    translate: <function translate at 0x7efe1c4d4230>
    options: {...} (0)
    view: <RenderWidget ploneform-render-widget at 0x7efe1d9ee410>

Note the part about ascii/unicode/non-null. I know that the data is not null because it is entirely readable (as preformatted text) when the object is viewed (without an error message). I also checked my code, and the data type for the RichText field is <type 'unicode'>. Moreover, this problem does not occur when the text data is not preformatted html (

).

The problem is not viewing the data. It renders correctly. It is only when I try to edit the object that I get that error message. Prepending and appending to the

 and 
tags, respectively, did not help.

Any ideas what the problem with

 is all about?  Am I missing something?

The blanks and odd spacing are because of the

<pre> and </pre> fields not being called out in code. Should have done that, sorry.

The problem that I have been experiencing is only when you edit an object with <pre> tags in the RichText (text) property. It renders fine (as pre-formatted text), but I get that error message when trying to edit the object.

It's always a good idea to sanitize HTML snippets before importing them into Plone...there are various Python solutions for that.

Hmmmm.... What, exactly, do you mean by HTML snippets? In this case, the only HTML that was included in the text were <pre> and </pre> tags at the beginning and end, respectively. That is all that it took to cause the problem. Note, in other instances of the same class, the text field was provided with copies of HTML-encoded text -- with no problems. That suggests something about the pre tags are causing the problem.

I need the pre tags because the text is old (pre-1950) and simply formatted with carriage returns ('\n'). Otherwise, Plone renders it as HTML and takes out all the \n's and the whole text looks like a jumbled block.

NOTEZ BIEN:

I performed a few experiments, and this problem is isolated to (and so far only to) situations where the text field contains the pre tags surrounding plain text AND when the property is so loaded and the object is instantiated via the Plone API. You don't get this same problem when you create a standard Page object and use Tools | Source Code to pre/append pre tags to plain text.

The obvious work-around would be to edit the text and insert paragraph ("p") tags around the \n characters (or better yet, simply add a break tag with each \n), but I don't think that we should have to do that. What is it about the API instantiation process that causes this problem?

Well, the problem was not with an HTML snippet. Rather, it was with (as far as I can tell) with a "new page" character ("\f"). For whatever reason, Plone would render it correctly (I think by ignoring it). However, upon edit, you were greeted with the error message that I mentioned above. In any case, for posterity, if you run into a Word document that includes the \f code, then you might want to remove it. That's the workaround that works for me.