I have been strugling with importing an old Frontpage2003- site to Plone.

Finally, I thought things were OK, since the content displays correctly.
Unfortunately, when I try to edit I get

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

If I check 'obj.text.output', I get this. Is there a way I can search / replace / change from a script so I dont have to do all (20.000 pages) the import again

b'Familjen Onstads verksamhet under bekv\xc3\xa4mlighetsflagg \xc3\xa4r kans'

PS: Unfortunately, som of files have both this AND   etc.

Thanks. Do you know if it is possible to 'find out what needs to be changed' (There must be some control characters, but I cant 'see them (in obj.text.output)

I think you've to convert this to unicode equivalent.

In fact, it is not the problem. It looks like there are some control characters ( \t \n or similar that messes up some of the pages. I am not exactly sure which yet). There are so much hard-coded html (thank you Microsoft). And so many different encoding. Maybe it is better to do some kind of lxm clean. I will try that and report back ( loop, read body text, = lxml.html.clean.clean_html(bodytext); save bodytext.

Will report back

I got rid of some, by using replace \t\n before import but seems like there are more.