Handling Text Docs

Hey all,

I have a site that allows extremely non-technical users to cut and paste documents they’ve created into a rich text editor.

Once they paste and save the document (normally from a word processor) I strip out all the RTF formatting and apply my own styles to make the text consistent for display purposes, then store it in the database.

However, some users also write directly in the editor (nicEdit) and when they save, there is no RTF formatting, only <div> and <br>. The problem with this is that the first line as generated by the editor is not wrapped in any kind of tags and so the reformatting process breaks.

So my general question is: How do sites that allow large text docs to be submitted by users handle the insane amount of formatting to get all docs to appear consistently when displayed later? What is the overall practice for handling the display/formatting of large docs?

These users are not able or willing to place any kind of markup on their own, so it’s something that has to be done by me in code.

Any suggestions would be greatly appreciated :)

Thanks!

http://php.net/manual/en/book.tidy.php ?