Samuel Sjöberg's weblog

Skip to navigation

Generating XML Paragraphs

When I built this weblog I wanted it to be easy to post entries and comments. My solution is to allow a subset of XHTML-tags and then validate the entry before it is accepted.

When writing a post, I have no problem with adding a em here and a h2 there, but all the p's can drive you nuts. Therefore, my next quest was to find a solution that automagically adds paragraphs without violating other block-tags or destroying code section and so on.

After some battling on my own, I started looking for other solutions. The first solution I found was Photo Matt's New Lines to Paragraphs script. After a few test runs I concluded that it didn't work for me. However, some of the regular expressions where different from the ones I had come up with, therefore I started to combine my own ideas with Matt's script.

My solution can be found in this source file. It's to much mind-crumbling code and entity-swapping to allow for publication in the post... Below is some explaining.


The function I've written works with much less tags than Matt's, but instead it handles line breaks correctly when they are nested inside code blocks. It also deals with paragraphs that is nested inside li elements (which is possible in some cases). I'm using a function-call-replacement to fix the paragraphs inside code and li elements. I found that to be the easiest way (for me) to solve the problem.

Handling entities

After the problem with paragraphs was fixed, I faced the dreaded entities. The problem was that after I had submitted &©, it was converted into ©. However, a simple regular expression was all that was needed. However, I'm still a bit confused by the names of the entities-functions - don't know what I was thinking...

Anyway, decodeEntities is called when XHTML output is wanted, i.e. the entities should display as symbols. encodeEntities is used when the entities should display as a group of characters, thus be editable.

The pieces combined

All pieces needed to solve the problem is now present, and the convert2xml function is used called to do the magic.

Of course, there is a convert2char function as well. You can find it together with the other goodies I've described in the PHP source file I've uploaded.

Reader comments

  1. Even though I'm currently using your script, I have been thinking about whether a different approach could be more efficient. I haven't put any effort into this yet - already having a solution that works OK doesn't really encourage me to do so - but the idea involves simply stepping through the text content from start to end, keeping track of encountered tags using a stack.

    I'll be sure to drop a note here if I ever follow through with this. It's really a matter of having sufficiently little to do, I guess...

    3rd August 2005, 11:35 CET. 

Pages linking to this entry

Pingback is enabled on all archived entries. Read more about pingback in the Pingback 1.0 Specification.

About this post

Created 3rd August 2005 00:07 CET. Filed under PHP.

1 Comment
0 Pingbacks