dotMobimobiThinkingmobiForgemobiReadyDeviceAtlasgoMobi

Transcoding: what are the Viagra, Cialis and "Dear Winner" of HTML tag soup?

ronan's picture
Posted by ronan 3 years 23 weeks ago
Bookmark and Share

Over the last 6 months or so part of the tech team in dotMobi has been working on Instant Mobilizer, a site mobilizer service to help small businesses dip their toes in the waters of the mobile web. One of the big problems with content transcoding is deciding when it should be applied: if there is little hope of it working well another alternative should be offered. 

In order to work out when content transcoding is likely to work well or not on a particular site or page, we decided to try out some Bayesian logic. Bayesian logic, named after its creator Reverend Thomas Bayes, appears to have become popular only about 270 years after it was devised, thanks to the invention of email and the subsequent deluge of spam. Thankfully, the advances in computing that made internet email possible have also enabled the application of Bayesian probability in a manner fast enough to be reasonable, an experience Reverend Thomas never lived to see. As ever, new problems and new solutions advance in lockstep.

rev thomas bayes

We trained up a home-grown Bayes implementation using a corpus of human rated known-good and known-bad sites (sites that do/don't transcode well to mobile versions). In our first iteration of the test we used simple HTML tags as the training tokens, and gave the Bayes algorithm no prior knowledge of the world apart from what humans had previously rated as good or bad experiences. After a good deal of number crunching the following HTML tags emerged as the leading indicators of the likely success of page transcoding:

Good tags

  • h4
  • blockquote
  • dl
  • address

Bad tags (the "viagra" of tags):

  • frame
  • frameset
  • noframes
  • object

The bad tags are pretty obvious — even on a PC browser frames are usually a usability disaster. The object tag also needs no explanation: there is not much that can be done with a Flash object to make it work well on mobile devices.

The good tags are more interesting:

  • h4 - probabably indicates a semantically designed page
  • blockquote - tends to be used in "crafted", semantically-designed pages
  • dl (definition list) - rarely used tag, mostly used by HTML cognoscenti
  • address - ditto

In summary, and this isn't really surprising on hindsight, pages that are semantically (rather than visually) designed lend themselves best to being transcoded into other formats.


Posted by ronan 3 years 23 weeks ago

ronan's picture

« m o b i l i s t » @xbs

Posted by mikerowehl 3 years ago

Hey Ronan! Awesome, that is a pretty interesting insight. How are you using the info out of that? Are you just using it to estimate how well you think the mobilizer will function? Or in some way feeding it into the transform itself and modifying how you munge the page?

Would be awesome to get some live info back from the system out in the wild and feed it back, allow for some A/B testing of transformation styles and see which algorithms work well across sites. Sounds like you folks are really taking advantage of the large footprint you have to move this along in a way that it couldn't have as an independent project. Fantastic!

Posted by ronan 3 years ago

Thanks Mike! Initially we'll be using this to help us pitch Instant Mobilizer to the right customers but, as you said, we can also use it to get a sense of the DNA of the site in question to decide how best to treat it. In the longer term, rather than training our algorithm on previously captured data we'll use real data from prospective customers who did/didn't decide to go for the service based on the preview. Then over time we should be able to accurately predict how likely it is that a given site will accepted by future customers.

Ronan Cremin, dotMobi

Posted by MauriceJNK 3 years ago

Nice article

Posted by igor.faletski 3 years ago

I think the approach is interesting, but nearly impossible to perfect. Instant Mobilizer is trying to make decisions on priority of content in the mobile context based on XHTML tags. Instead, a human should solve such issue once and for all for their own website.

Posted by worldpharmarx 1 year ago

Nice Post,Thanks for sharing these lovely information...

http://www.worldpharmarx.com

Posted by worldpharmarx 1 year ago

Nice Post,Thanks for sharing these lovely information...

Posted by xiaopy12 1 week ago

Louis Vuitton Sale dfhrt