A short time ago while playing around with markup validators I encountered an interesting problem. This problem has also been mentioned on the wmlprogramming list, but it's worthy of a bit more explanation. I received a validation error, and at first I thought the cause was bad XHTML markup. As I investigated however, I realised that every document with a particular DOCTYPE was encountering the same problem…
The DOCTYPE in question (below) refers to the OMA XHTML Mobile Profile 1.1 DTD
The validator output was relatively helpful in reporting what the problem was:
Non UTF-8 byte in input
but in this case it was not so good at telling me where the problem was. I was pulling my hair out searching for this non-UTF-8 character in the input XHTML document, and then it dawned on me to check the other input to the validator, the DTD itself – the one published on the OMA site that I had downloaded and stored locally.
I figured I'd mangled it somehow during the download, so I downloaded it again – and the same result. So then I tried to reference the DTD directly from its URI, and once again the UTF-8 error occured.
I then opened up the DTD and in the comment section toward the beginning of the document there was chunk of text (below) which contained a set of smart quotes which are apparently outside the UTF-8 range.
NO REPRESENTATIONS OR WARRANTIES (WHETHER EXPRESS OR IMPLIED) ARE
MADE BY THE OPEN MOBILE ALLIANCE OR ANY OPEN MOBILE ALLIANCE MEMBER
OR ITS AFFILIATES REGARDING ANY OF THE IPR’S REPRESENTED ON THE “OMA
IPR DECLARATIONS” LIST, INCLUDING, BUT NOT LIMITED TO THE ACCURACY,
COMPLETENESS, VALIDITY OR RELEVANCE OF THE INFORMATION OR WHETHER OR
NOT SUCH RIGHTS ARE ESSENTIAL OR NON-ESSENTIAL.
The quotation character in "IPR'S" and the two quote characters surrounding "OMA IPR DECLARATIONS" were making the validator barf. Easy solution: replace these quotes. The good news is that this text is in a comment section, so no fear of breaking the DTD.
Anyway I replaced the offending quotes from the DTD and all the previously invalid documents now validated. The updated DTD is posted below for your convenience (use at your own risk). Perhaps OMA will publish an updated version.
Interestingly the problem does not occur in the other XHTML Mobile Profile DTDs. The same section of text from XHTML MP 1.2 DTD for instance contains the same chunk of legalese text, but uses valid quote characters.
This DOCTYPE is turning up in more and more mobile pages these days so my message is caveat validator – a validator is only as good as the DTD it makes use of. If you happen to run any kind of a markup validation service, which checks XHTML Mobile Profile documents I would recommend using this modified DTD (of course it comes with no warranty or liability for dotMobi etc. 😉 )