Audio and video are very important tools in the arsenal available to web-developers, marketing/sales types and content providers (to name but a few). Few are likely to remember the internet before YouTube, before on-demand playback of your favourite tv shows or even before Google for that matter, which means that the ability to embed movies and rich content in a webpage is almost certainly taken for granted in today’s always-on world.
When I started researching the new multimedia properties for DeviceAtlas, I believed that it would be a relatively simple affair. Maybe a few new properties, such as “supports playback of .3gp”, or “supports streaming of .mp4”. I honestly felt that it would be a walk in the park.
How wrong I was π
Envelopes/Containers, Codecs, Profiles & Levels
Envelopes
An envelope, or container, is a file format that contains multimedia content, compressed using one of the various audio and video codecs. The most popular envelopes are listed on the Wikipedia Container Format (digital) page, and contain WAV (or RIFF), 3GP, AVI and MOV to name but a few. Odds are, that you know most of the envelopes without even realising it, as the file extensions used for most audio/video files would hint to their container type.
Codecs
A codec (from the words coder-decoder), is used to encode digital content, and usually exhibits some form of compression. Some of the more well known codecs would be WAV (as in “Microsoft WAVE”) and MP3, both used for the encoding of digital audio files. These 2 codecs have a few attributes that vary based on the level of compression involved, for example:
- Filesize: WAV is almost always uncompressed (largest filesize), MP3 is more highly compressed (smallest filesize)
- Quality: WAV is lossless (best quality), MP3 (depending on the level of compression) is lossy (varying quality)
Here is a more details list of codecs in general, and video codecs..
Profiles & Levels
A profile, when used in the context of codecs and envelopes, is essentially a group of features. Depending on the grouping, and the codec used, there may also be a further grouping of features known as “Levels”. A modern PC is more than capable of handling pretty much any profile within a particular codec, so long as it’s told how to. However, a mobile phone isn’t quite as powerful as your desktop PC, so its level of support for multimedia playback won’t be as plentiful as the PC. Because of this limitation, manufacturers aim toward support of certain groups of properties, or profiles/levels instead.
The H.264 profile and levels sections on Wikipedia, gives you a small insight into this, but the MPEG-4 Part 2 profile may be easier for you to glance over as it only contains 2 profiles.
If you take a quick look over the “levels” table I linked to above, you’ll see what I mean – here are a few important points to note:
- Each individual level has a different maximum possible bitrate, depending on which profile you choose.
- Each level supports different resolutions
- Each level supports different frame rates
It’s not hard to see just how many possible combinations there are, and how this would very quickly overwhelm even the most interested student. And this is only for one codec!
Overall, the most important thing to remember from all of this, is that when you’re creating any content for consumption on mobile devices, you really have to think about your target audience. Should you create a lower-quality video for use on lower-end devices, as well as a medium and high quality video for use on higher end devices? My view is that the right thing to do would be to create 1-2 videos for each band of device, so you would have maybe 1 video for the low-end devices, 1 or 2 for the devices sitting on the fence (mid-range) and perhaps 2 or 3 for the higher end ones. This would provide you with a much wider range of users to target, while still making sure that anyone on a lower-end device is still catered for!
Of course, you’d still need to be able to tell what type of device is being used to access your services, but that’s where DeviceAtlas comes in π
Content Types
Moving on from the endless list of envelopes, codecs, profiles and levels, we come to content types (or MIME types, if you’d prefer). Device manufacturers sometimes state support of a particular type of media for their devices, such as my trustworthy Nokia E65, as described on the well presented Forum Nokia device details page for the phone.
Unfortunately, they don’t provide much information in the way of multimedia support, but it clearly states that the E65 supports “AMR-WB“. However, after some testing, this support (in the 3GP envelope, at least) is conditional on the MIME type. According to the 3GPP RFC, the following is understood:
- “audio/3gpp” *may* be used for files containing audio only
- “video/3gpp” *must* be used for files containing video only and
- “video/3gpp” *may* be used in case 1 if desired
The last point here is of particular importance, as my E65 was unable to play any audio-only files, when they were delivered using “audio/3gpp“. But when sent with the generic “video/3gpp” MIME type, they worked perfectly. (This is a hint – *always* use the video alternative, regardless of file content, for 3GP files).
This kind of thing is incredibly important when it comes to mobile devices. You simply cannot rely on their stated support matching their actual support of the various envelopes, codecs, profiles or levels – and is only really a problem when something goes wrong β by which time it is, of course too late.
Some of you may be wondering about another MIME type that I could have sent – “audio/amr“, as mentioned in Juanin’s excellent article on Content Delivery for Mobile Devices (which is well worth a read if you are interested in this). There’s a simple explanation for this β yes we are testing for the playback and streaming support of an audio codec (AMR), but it is in a video envelope (3GPP)!
The devil is in the detail here I’m afraid, and this is another very important point to note β you must think very carefully about how you will combine all of the elements (envelope + video codec, or envelope + audio codec) before you’ll know how you can deal with the resulting file.
Streaming
Another topic I’d like to discuss briefly, is the wonderful world of streaming. In order to cover our new device properties adequately, and to cater for the ever increasing pool of website owners that would like to stream content to mobile users, I had to setup a video/audio streaming server.
Darwin Streaming Server is the tool of choice when it comes to this topic, so naturally my quest for multimedia enlightenment was drawn in that direction. Setting up the streaming server was a relatively simple affair, with the main issues behind getting it up and running being a firewall misconfiguration (it expects port 554 to be open to the world on both TCP and UDP), and file encoding issues (it isn’t able to stream *everything* you throw at it).
The initial test run of the server was done using a sample video file that it’s shipped with β and the first time I connected to the stream using my phone I felt a sudden rush of excitement when I saw the moving animation appear on my screen! I know, only a true nerd would feel like that, but believe me, it’s still a pretty rare thing to be able to stream a video from a server to a mobile phone, so I got caught up in the moment π
While it doesn’t have a very advanced interface for administration, it does provide you with some rudimentary tools for creating and managing audio and video streams.
The biggest flaw in the system, that we found during our testing, is the lack of any detailed logging of connection attempts or stream information, which would help with the inevitable debugging required when dealing with the various types of devices that we have at our fingertips here in dotMobi.
This flaw, however, resulted in the discovery of some interesting information regarding connection attempts of the Sony Ericsson V640i (that Andrea constantly maintains is better than any phone I’ve ever owned). We noticed, using a combination of TCPDump, and UDPDump (the commands used are quoted below), that the SE phone connected to the streaming server, saw that it didn’t support the stream in question, disconnected and finally informed the user that it could not connect to the server! Yes, you read that right, it connected, fell over and told the user there was a connection error instead of it being unable to support the stream!
TCP/UDP Dump & Expected Output
TCPDump: We wanted to ignore all traffic over port 22 (SSH), so used the following command
tcpdump -X -n -i eth0 not port 22 and tcp
UDPDump: This was considerably easier
tcpdump -n -i eth0 udp
When you run these commands, you should immediately see if the server is doing what it should be doing.
In the case of a tcpdump, the output should look similar to:
10:01:46.745274 IP x.x.x.x.35393 > x.x.x.x.554: S 6354390:6354390(0) win 65535 <mss 1460,nop,wscale 1,nop,nop,sackOK>
0x0000: 4500 0034 006e 4000 7706 e471 54cb 8c62 E..4.n@.w..qT..b
0x0010: 0ae2 32d5 8a41 022a 0060 f5d6 0000 0000 ..2..A.*.......
0x0020: 8002 ffff cd90 0000 0204 05b4 0103 0301 ................
0x0030: 0101 0402 ....
10:01:46.764485 IP x.x.x.x.35393 > x.x.x.x.554: . ack 1 win 32850
0x0000: 4500 0028 006f 4000 7706 e47c 54cb 8c62 E..(.o@.w..|T..b
0x0010: 0ae2 32d5 8a41 022a 0060 f5d7 4dc6 a347 ..2..A.*...M..G
0x0020: 5010 8052 9cec 0000 P..R....
10:01:46.768711 IP x.x.x.x.35393 > x.x.x.x.554: P 1:139(138) ack 1 win 32850
0x0000: 4500 00b2 0070 4000 7706 e3f1 54cb 8c62 E....p@.w...T..b
0x0010: 0ae2 32d5 8a41 022a 0060 f5d7 4dc6 a347 ..2..A.*.`..M..G
0x0020: 5018 8052 28a8 0000 4f50 5449 4f4e 5320 P..R(...OPTIONS.
You’ll see a flurry of activity, similar to that shown above, which will contain various codewords, such as OPTIONS, DESCRIBE, PLAY and TEARDOWN (this is not a complete list).
When the TCP dump stops scrolling, the UDP dump should start instantly, which is a sign that your server is running correctly β UDP is what is used to stream your content! The following is some sample output from the udpdump:
10:09:51.057671 IP x.x.x.x.6970 > x.x.x.x.3456: udp/vt 28 128 / 56
10:09:51.057682 IP x.x.x.x.6970 > x.x.x.x.3456: udp/vt 1450 128 / 24
10:09:51.057688 IP x.x.x.x.6970 > x.x.x.x.3456: udp/vt 1450 128 / 24
10:09:51.057695 IP x.x.x.x.6970 > x.x.x.x.3456: udp/vt 1450 128 / 24
10:09:51.057700 IP x.x.x.x.6970 > x.x.x.x.3456: udp/vt 1450 128 / 24
10:09:51.057705 IP x.x.x.x.6970 > x.x.x.x.3456: udp/vt 1450 128 / 24
10:09:51.057713 IP x.x.x.x.6970 > x.x.x.x.3456: udp/vt 1450 128 / 24
This data will fly by quicker than you can read it β it will be obvious if it’s working!
As soon as you hit pause or stop on your device, the UDP stream should stop dead. Continuing or restarting the playback on your device should result in a flurry of TCP and UDP activity, until it eventually settles back down into UDP only traffic.
The Future
Whether we like it or not, multimedia is becoming a bigger and more important part of our daily online lives. We’re connected to the internet almost constantly, we have near instant access to our emails no matter where we are, and it’s only a matter of time before we’re streaming live TV to our phone’s on the bus while making our way into work in the mornings (in some cases, this is already a reality).
Details of audio and video support on mobile devices will no doubt quickly become a big part of what we, as developers, rely on for the distribution of content to users, and I for one cannot wait to watch my favourite episodes of Family Guy, The Unit or Two and a Half Men on the bus, without needing enough internal storage capacity to supply a small data centre π
However, unless we can populate these properties with valid information, then they’re not going to do us much good. For that reason I strongly recommend that you visit http://ta-da.mobi on your mobile phone, login using your mobiForge/DeviceAtlas credentials, and start tap-tap-tapping away. Remember, every answer you provide validates our data. If you think something is wrong, let us know by contributing to DeviceAtlas, using either TA-DA or the Web Interface!
Keep your eyes peeled for another article in the near future on how to use these new DeviceAtlas properties to give you more flexibility in how you serve content to your mobile users!