Transcoding: what are the Viagra, Cialis and “Dear Winner” of HTML tag soup?

Over the last 6 months or so part of the tech team in dotMobi has been working on Instant Mobilizer, a site mobilizer service to help small businesses dip their toes in the waters of the mobile web. One of the big problems with content transcoding is deciding when it should be applied: if there is little hope of it working well another alternative should be offered. 

In order to work out when content transcoding is likely to work well or not on a particular site or page, we decided to try out some Bayesian logic. Bayesian logic, named after its creator Reverend Thomas Bayes, appears to have become popular only about 270 years after it was devised, thanks to the invention of email and the subsequent deluge of spam. Thankfully, the advances in computing that made internet email possible have also enabled the application of Bayesian probability in a manner fast enough to be reasonable, an experience Reverend Thomas never lived to see. As ever, new problems and new solutions advance in lockstep.

rev thomas bayes

We trained up a home-grown Bayes implementation using a corpus of human rated known-good and known-bad sites (sites that do/don't transcode well to mobile versions). In our first iteration of the test we used simple HTML tags as the training tokens, and gave the Bayes algorithm no prior knowledge of the world apart from what humans had previously rated as good or bad experiences. After a good deal of number crunching the following HTML tags emerged as the leading indicators of the likely success of page transcoding:

Good tags

  • h4
  • blockquote
  • dl
  • address

Bad tags (the "viagra" of tags):

  • frame
  • frameset
  • noframes
  • object

The bad tags are pretty obvious — even on a PC browser frames are usually a usability disaster. The object tag also needs no explanation: there is not much that can be done with a Flash object to make it work well on mobile devices.

The good tags are more interesting:

  • h4 – probabably indicates a semantically designed page
  • blockquote – tends to be used in "crafted", semantically-designed pages
  • dl (definition list) – rarely used tag, mostly used by HTML cognoscenti
  • address – ditto

In summary, and this isn't really surprising on hindsight, pages that are semantically (rather than visually) designed lend themselves best to being transcoded into other formats.

Exclusive tips, how-tos, news and comment

Receive monthly updates on the world of mobile dev.

Other Products

Market leading device intelligence for the web, app and MNO ecosystems
DeviceAtlas - Device Intelligence

Real-time identification of fraudulent and misrepresented traffic
DeviceAssure - Device Verification

A free tool for developers, designers and marketers to test website performance
mobiReady - Evaluate your websites’ mobile readiness

© 2024 DeviceAtlas Limited. All rights reserved.

This is a website of DeviceAtlas Limited, a private company limited by shares, incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at 6th Floor, 2 Grand Canal Square, Dublin 2, Ireland