Update 10/02/2009
We have just released version 2.3 of our API as a Beta with a whole host of new features that make integrating it into any PHP website much much easier. The new API also does all the hard work described in this tutorial for you, so stop reading and check it out. The download comes with extensive samples and documentation to help you get started too.
Find out more…
(Part II in a two part series. See Part I )
"DeviceAtlas is the world's most comprehensive database of mobile device information." [deviceatlas.com]. The database comes with an API that developers can use to determine the capabilities of devices browsing their website and in so doing adapt their content to make it suitable for the user’s context.
Part II of this tutorial will point out some of the challenges that developers may encounter in implementing DeviceAtlas such as user agent "sniffing" and data caching for increased performance. It will also provide possible methods to address these challenges and a few examples of these.
Background
In order to provide relevant content to users on mobile devices content providers are forced to adapt their content to suit the capabilities of the user's device. In order to do this the content provider must analyse the request sent by the user to try and determine what device they are actually using.
All devices, when connecting to a content provider will send a number of headers with details about the web browser being used, the character encoding they support, the MIME types they can handle and more. Unfortunately the information provided in these headers is often incomplete, inconsistent or even modified by proxies along the way so that the content provider faces an ever more complex task determining what device is trying to access their content.
The most effective clue in determining a user's device is the "User Agent" header. The DeviceAtlas API uses this header to query its extensive database of devices and to give the content provider details about any capabilities or limitations of the user's device. The provider can then be sure they are delivering content that will be accessable and usable for that specific device.
Introduction
DeviceAtlas simply provides the database of devices and functions to query this database with a "User Agent" string. Functionality such as caching the database and "sniffing" out the user agent string in unusual circumstances is left to the developer.
This second part of the tutorial builds on the basic concepts introduced in Part I tackling these extra challenges you may face in implementing the API.
Target
Part II of the tutorial introduces some more advanced concepts than Part I. Beginners may not be familiar with the caching techniques but will certainly be able to use them as demonstrated.
System Requirements
To complete the tutorial you will need a web server (local or remote) with a recent version of PHP installed. (Version 5.2.3 is a minimum requirement. The DeviceAtlas data is stored as a json file and older versions are unable to recurse deep enough to load the data). One of the caching techniques uses a PHP extension called memcache which will need to be installed for the example to work. (More info about memcache at php.net). I am using Apache 2.2.4 with PHP 5.2.5 on Windows XP.
Using DeviceAtlas on older versions of PHP may be possible with some hacks however that is outside the scope of this tutorial, perhaps another day.
Time
This part of the tutorial will take between 10 minutes and half an hour depending on your level of competence with PHP. Advanced PHP developers will be able to skim over this tutorial and will most likely find the example code quite self-explanatory.
File Structure
Download the latest PHP version of the DeviceAtlas API (you will need to be logged in to mobiForge).
Unzip the contents into the web root of your project. I have called the project tut2 so my directory structure looks like this:
/tut2
/doc
(API Documentation…)
/sample
/json
DeviceAtlas.json
index.php
da_api.php
da_test.php
- doc contains the phpDoc for the API which is a useful reference for the functions offered by the API.
- sample contains a basic sample of the API in use and a development version of the json database. The sample contains an example of caching which we will examine in this tutorial.
- The two files in the root are da_api.php the source code of the API and da_test.php a command-line script for testing the json.
To get started I have simply copied the contents of the sample directory into the root of the project so the project directory now looks like this:
/tut2
/doc
(API Documentation files…)
/sample
(Sample files…)
/json
DeviceAtlas.json
index.php
da_api.php
da_test.php
You can set the directory structure out however you like, the important files we need are index.php, da_api.php and DeviceAtlas.php.
index.php will form the root of the project which will include API functions from da_api.php and load the device database from DeviceAtlas.json.
Tutorial
Caching
Option 1: Memcache
What is memcache?
Memcache is a PHP module that connects to a memcached server. This allows developers to store objects in memory for fast retrieval later.
"memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load" [http://www.danga.com/memcached/]
In the context of DeviceAtlas, using memcache allows us to store the device data-tree in memory after we have loaded it from the json. The sample code provided with the DeviceAtlas API shows how this can be done and we will begin by examining this code.
Start by taking a look at the code that is already in index.php. Notice that I have changed the file reference on line 2 as da_api.php is now in the same directory. You may also have to alter line 17 depending on how you have laid out your file structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
<?php include 'da_api.php'; echo '<pre>'; $s = microtime(true); $memcache_enabled = extension_loaded("memcache"); $no_cache = array_key_exists("nocache", $_GET); if ($memcache_enabled && !$no_cache) { $memcache = new Memcache; $memcache->connect('localhost', 11211); $tree = $memcache->get('tree'); } if (!is_array($tree)) { $tree = da_get_tree_from_file("json/DeviceAtlas.json"); if ($memcache_enabled && !$no_cache) { $memcache->set('tree', $tree, false, 10); } } if ($memcache_enabled && !$no_cache) { $memcache->close(); } $properties = da_get_properties($tree, $_SERVER['HTTP_USER_AGENT']); //further performance can be gained through caching the properties (...) $e = microtime(true); print "Time taken: " . floor(($e - $s)*1000) . "msrn"; print_r($properties); echo ' |
’;
?>
Functionally, this code will simply load the device data-tree and dump all the properties for the current device on screen.
The important sections to note are lines 8 – 25. All of these lines could, in a simple scenario, be replaced by line 17:
1 |
<?php $tree = da_get_tree_from_file("json/DeviceAtlas.json"); ?> |
However, these lines demonstrate how it is possible to use the memcache module to cache the device data between requests. Line 17 requires PHP to parse the entire database from a json file which, in turn, requires the json string to be read from disk. This is an expensive round trip to make for every page load, especially on a server that services a large volume of requests.
On lines 8 and 9 we establish if the memcache module is installed and whether the request is over-riding the cache using a $_GET[no_cache] request variable. Following these checks we attempt to load the data-tree from memcache.
Lines 16 – 21 are the fallback. If the tree has not been loaded we load it from the json file and store it in the cache if applicable. This means that subsequent requests can use the cached version of the tree until the cache expires.
By storing the time at the start (line 6) and calculating the difference at the end (line 31) of the code we are able to see the difference in processing time for multiple requests.
As mentioned in the comment on line 28 we could further speed up the process by caching the properties for each user agent we receive. This makes sense since over any one period of time we will receive numerous requests from the same user. By caching all the device properties we negate the need to load the api at all for subsequent requests from the same device.
I have modified the code a little to add this functionality:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
<?php echo '<pre>'; $s = microtime(true); $memcache_enabled = extension_loaded("memcache"); $no_cache = array_key_exists("nocache", $_GET); if ($memcache_enabled && !$no_cache) { $memcache = new Memcache; $memcache->connect('localhost', 11211); $properties = $memcache->get(md5($_SERVER['HTTP_USER_AGENT'])); $cached = 'properties'; if(!is_array($properties)) { $tree = $memcache->get('tree'); $cached = 'tree'; } if(!is_array($properties)) { include 'da_api.php'; if (!is_array($tree)) { $cached = false; $tree = da_get_tree_from_file("json/DeviceAtlas.serialized"); if ($memcache_enabled && !$no_cache) { $memcache->set('tree', $tree, false, 1000); } } $properties = da_get_properties($tree, $_SERVER['HTTP_USER_AGENT']); if ($memcache_enabled && !$no_cache) { $memcache->set(md5($_SERVER['HTTP_USER_AGENT']), $properties, false, 30); } if ($memcache_enabled && !$no_cache) { $memcache->close(); } $e = microtime(true); print "Time taken: " . floor(($e - $s)*1000) . "msrn"; print "Used a " . ($cached ? 'CACHED' : 'FRESH') . " version of the " . ($cached ? $cached : 'tree and properties') . ".rn"; print_r($properties); echo ' |
’;
?>
Notice also I have added a flag to the output so we can see when we are using cached data and which data we are using.
I have also modified the cache timeouts so that we hold on to our cached tree for longer (it's not gonna change until you update the json anyway.)
Rudimentary testing (a few successive refreshes) indicates that when we are using the cached properties the processing time for the page drops very significantly making this approach something that should be very seriously considered.
Unfortunately not everyone has access to memcache so we must consider other options. The massive savings achieved in caching the device properties point to a simpler solution that anyone could implement.
Option 2: Session level caching
Rather than caching the properties for the device based on the user agent string we cache the properties of the user's device IN their session. This obviously requires that we use sessions but this should be easier to do than using memcache.
Have a look at how I have changed things up quite a bit, all the memcache stuff has been stripped out, but the final product will, for all intents and purposes, look identical to the memcached version:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
<?php session_start(); echo '<pre>'; $s = microtime(true); $no_cache = array_key_exists("nocache", $_GET); if(!is_array($_SESSION['device_properties']) || $no_cache) { include 'da_api.php'; $cached = false; $tree = da_get_tree_from_file("json/DeviceAtlas.serialized"); $properties = da_get_properties($tree, $_SERVER['HTTP_USER_AGENT']); $_SESSION['device_properties'] = $properties; } else { $properties = $_SESSION['device_properties']; $cached = 'properties'; } $e = microtime(true); print "Time taken: " . floor(($e - $s)*1000) . "msrn"; print "Used a " . ($cached ? 'CACHED' : 'FRESH') . " version of the " . ($cached ? $cached : 'tree and properties') . ".rn"; print_r($properties); echo ' |
’;
?>
Notice that we require the session_start() call before our page outputs anything (an important point to remember in a production site where you send 'content-type' or other headers).
Following that the code is simple and similar to our memcache implementation. We check if we have cached data, if we do we use it if not we get fresh data and cache it.
On a production server that may be servicing requests from a large number of clients the benifits of this approach will be lesser than on a server that has fewer clients but a high volume of requests. The obvious benefit of using the memcache method is that clients with the same device type will share the cached version of their devices properties.
Which method suits you will depend on your request volumes, your target audience and your server architecture, I would recommend setting up some tests similar to these and performance testing which method bests suits your situation.
"Sniffing" out the User Agent
You will notice that the first argument passed to da_get_properties in is the device data-tree we loaded above.
The second argument is a server variable HTTP_USER_AGENT. This string represents the user agent header that was passed to the server. The Device Atlas API uses this string to determine the device and get it's properties.
Unfortunately this is a highly simplified approach which may leave you scratching your head when DeviceAtlas seemingly can't detect your user's device.
The problem
Most mobile web users will be fetching content via a proxy of some sort which is usually controlled by their service provider or may even be part of their browser software. These proxies will often manipulate the User Agent header which makes life difficult for us developers.
One example of such a proxy is the Opera Mini browser. The reference documentation shows how handle this situation and I am going to re-use that code to give an example of how to "sniff" out the correct User Agent.
Fortunately, although Opera Mini does manipulate the header, it also let's us know that it has done so and gives us the original header disguised as a header called HTTP_X_OPERAMINI_PHONE_UA. Notice how I have modified the code above to test for this header and use it where needed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
<?php session_start(); echo '<pre>'; $s = microtime(true); $no_cache = array_key_exists("nocache", $_GET); if(!is_array($_SESSION['device_properties']) || $no_cache) { include 'da_api.php'; $cached = false; $tree = da_get_tree_from_file("json/DeviceAtlas.serialized"); //User Agent sniffing $opera_header = "HTTP_X_OPERAMINI_PHONE_UA"; if (array_key_exists($operaHeader, $_SERVER) { $ua = $_SERVER[$opera_header]; } else { $ua = $_SERVER['HTTP_USER_AGENT']; } $properties = da_get_properties($tree, $ua); $_SESSION['device_properties'] = $properties; } else { $properties = $_SESSION['device_properties']; $cached = 'properties'; } $e = microtime(true); print "Time taken: " . floor(($e - $s)*1000) . "msrn"; print "Used a " . ($cached ? 'CACHED' : 'FRESH') . " version of the " . ($cached ? $cached : 'tree and properties') . ".rn"; print_r($properties); echo ' |
’;
?>
Unfortunately there are numerous transcoding proxies that may modify the User Agent header you as the content provider finally receive. If you know of one please post thje details, and hopefully how you are able to "sniff" out the original header in the comments below.
Conclusion
DeviceAtlas provides the basic tools you require for device detection and content manipulation. Rather than enforcing caching or user agent sniffing methodology upon you this is left for you to design in a way that best suits your specific context.
Further reading regarding transcoding proxies (a hot topic of debate amongst mobile web developers) and the work that is being done by the W3C to develop some standards for their use can be found on this post.