…well, their syndication feeds anyways. Here’s the problem:
While working on SimplePie initially, I used copies of Dunstan’s Atom and RSS feeds because I felt that they’d be representative of most people’s decently well-formed feeds. I know that some people have worse feeds, and that Mark Pilgrim’s feeds are a bit too “academically” correct.
Dunstan has a problem with his feed. He uses the numeric entity for a “smart-apostrophe” in his feed’s
<title> tag. This happens to be a UTF-8 character. For whatever reason, parsing his feed in every PHP-based feed reader I’ve ever used displays that smart-apostrophe as a question mark. In wanting to build a “feed parser for the rest of us”, I decided to be smart and wrap a CDATA section around the contents of the
<description> on the fly for those that don’t already have them. Dunstan’s question mark becomes the character that it’s supposed to be. No problem.
On the other hand, Andrei also has a problem with his feed. Well, not really… it’s just that the fix I put in place to fix Dunstan’s feed broke Andrei’s feed. Andrei does a fake-out with his CDATA sections. He closes the CDATA section in
<description>, then has one last bit of content before closing the tag. This is just enough to get past SimplePie’s logic. Wonderful.
Since Dunstan’s issue is only in the feed’s
<title> tag, I went ahead and changed how SimplePie handles the feeds by removing the code for wrapping CDATA sections around
<description>. Both Dunstan and Andrei have working feeds again.
Then, I go and test it on Mark’s Feed Parser project feed. SimplePie breaks down again. Well, crap. Instead of using
<title> like normal people, Mark has to be all cool by using
<title type="text/plain">. Argh.
So, I’m off to find some code that can resolve this little quarrel. I’m thinking about going through and looking for UTF-8 entities (4-digit, typically begins with an 8: ’) and wrapping CDATA sections around those entities alone, which will probably work. I don’t want to release this software as 1.0 until it performs satisfactorily with every single feed in my entire reading list.