When it comes to site optimisation, we tend to focus a lot on page weight and bandwidth to the device. While these are very important, this is not the whole picture. Optimising for speed is important for all classes of end-user device – mobile and desktop alike. However, mobile devices pose some additional technical challenges, which makes them especially sensitive to site performance. As we will see, no matter how much bandwidth is actually available to the device, there are some limiting effects, which can prevent it from being fully utilised. The traffic profile associated with HTTP can be more latency-sensitive than bandwidth sensitive, and this is, in part, due to the short, bursty nature of the connections. While a high bandwidth connection may be good for downloading large files or streaming media, when it comes to fetching websites, latency is more important. It is no wonder that we fixate on bandwidth though. Connectivity is sold on the basis of bandwidth. After all, look at how it is marketed.
Before we start looking at why latency should be a greater concern, there is an important difference to point out between it and bandwidth. How we provision one compared to the other is dramatically different. Stuart Cheshire, quite a while ago now, put it very well. Suppose we have a link with certain bandwidth and latency characteristics. If we run out of bandwidth, it is relatively simple for us to increase our capacity. We just add an additional link and we have twice the bandwidth. As far as bandwidth goes, we can always add extra capacity. However, it is very difficult to reduce the latency of a connection. Adding another link run will not help – the latency of two connections is, at best, the same as the latency of one. Improvements in latency only come through improvements in the underlying technology.
Ultimately, we are limited by the speed of light. The fastest that data can travel, no matter the medium, can never be faster than the speed of light. Light can travel through fiber at around 200,000km per second. So, taking a link from Dublin to New York as an example (approximately 5,000km), it will take 25ms for a packet to travel that distance. Even if we consider the absolute optimal conditions (light travelling through a vacuum along the shortest path), it is not going to be dramatically better. We are already very well optimised, at least in terms of fiber, for transmission over large distances. We are within a factor of two of the theoretical minimum latency. In any case, it is actually the short distances, in the last mile, which tend to cause us problems, or, at least, this is where there is the greatest waste.
Causes of Latency – The Last Mile
A large proportion of latency is actually introduced in the last mile. This is ‘end’ bits of the network connection. This could be your ISP supplying your DSL at home or your cell operator if you are on the move. The last mile itself is made up of a couple of hops. First, you have the connection from the end-user’s device to the operator’s network. At home, this might be your personal WiFi network, served by your modem. For a mobile device, this could be a 2G, 3G, 4G or WiMax link. Either way, for mobile devices, the first hop is always going to be over a radio medium of some sort. Radio technologies are generally pretty poor in terms of latency. Here are some latencies that different types of radio will typically introduce:
- 2G : 300–1,000 ms
- 3G : 100–500 ms
- 4G : < 100 ms
There will probably also be a couple of other hops within the ISPs infrastructure before you hit the internet proper. Depending on how congested the ISPs network is, how many users are online, etc., varying levels of additional latency can be introduced here. In fact, given the higher bandwidth promised by 4G, there is likely a greater chance of congestion in the backhaul. It is much easier for a cell tower to get oversubscribed when the bandwidth allocated to each device is much greater.
Some recent studies (1, 2) have shown that WiFi connections carry the majority of users’ data. So, let’s look a bit at how WiFi works. Due to its Ethernet heritage, WiFi has inherited a CSMA/CD-type approach to transmission scheduling. There is no central scheduler ensuring that the connection is being used fairly and optimally. Each member of the network must work together to try and share the connection efficiently. After all, WiFi is, by its very nature, a shared medium. Each device has to share the same invisible wire. When a WiFi device wants to send some data, it:
- Listens to see if another device is currently transmitting
- If another device is transmitting, it waits until that device is done
- Then, it immediately start transmitting
So, the greater the number of devices on the network, the greater the chance of a collision – of two devices transmitting at the same time. When a collision occurs, both clients must back off for a random interval and try again. In fact, this is something 4G specifically attempts to address, by scheduling transmission windows to specific devices. What this means is that, unless you live in the wilderness (and do not own a microwave) you will see a great deal of variability in the latency of the first hop – between your device and your WiFi access point. Since the success of transmission is random, the chance of success is probabalistic. There are no guarantees when it comes to WiFi latency.
Even if you only have one device on your network, you still have to compete with other nearby networks. WiFi works on the unlicensed spectrum, of which there is a limited width. Chances are good that your network is sitting on the same spectrum as at least one of your neighbours’ networks. In addition to this, there could be other, non-WiFi devices, spilling out noise, such as baby-monitors. If you ping your router, you are likely to see blips of latency into the 100ms and above range. This is equivalent to almost four transatlantic hops (or two RTT), and it is only going across the room.
Another important thing to point out about how WiFi works is how it handles collisions and retransmissions. In order to detect that a frame has successfully been sent from WiFi client to WiFi access point, at the data link layer, each frame must be ACK’d by the access point. However, the WiFi protocol (pre-802.11n) only permits one in-flight frame at a time. This means that each frame must be sent and acknowledged one-by-one.
Causes of Latency – Internet
The distance that data travels will be a big determining factor here for latency. As I’ve mentioned already, we are very well optimised here and there is not much we can do to improve things – at least in terms of traversing large physical distances. If the latency is too high, one answer is to put the data closer to the user, by, for example, leveraging a CDN.
Another concern, not only here but I suppose anywhere, is what is called buffer bloat. A router will, necessarily, have some memory allocated to holding packets in-flight temporarily. As capacities of links have risen so have the buffer sizes. Unfortunately, some have become a little too big. A properly functioning TCP connection will have a little packet loss. This is, after all, how a sender knows when to back off – a sort of passive backpressure. Large buffers in network devices will mask this loss and cause delayed and irregular delivery, also known as jitter.
Causes of Latency – DNS
Before a client can even begin to request a page, it needs to resolve the hostname into an IP address. A given web page will include assets from multiple domains, so will require multiple DNS lookups. In the ideal case, the domain is cached on the device, so the cost is virtually zero. However, on a cold page load, the client will need to send a single UDP packet to its DNS server. Assuming the DNS server is within the operators network, which is a reasonable assumption, the DNS lookup will take one RTT of last mile latency. If the clients DNS server cannot resolve the hostname, then it will need to recurse and this will add more latency again.
What is important to note is that the latency introduced as part of the DNS lookups has nothing at all to do with page weight or the bandwidth available to the client. We are only exchanging a handful of packets. A higher bandwidth connection will not speed this process up. A lower latency connection will.
TCP Three Way Handshake
In order to ensure that the page that the server emits is the page that the client receives, we need a reliable transport for it to travel over. TCP is this transport and abstracts away the work of package reordering, retransmission and so on to the layers above it. TCP uses sequence numbers to uniquely identify each packet exchanged. Before the client and server can start exchanging data, they need to agree on a starting point for the sequence numbers. This is one of the purposes of the TCP three way handshake. Roughly, the three way handshake works like this:
- SYN – Client generates a random sequence number and sends a packet with the SYN bit set to server
- SYN/ACK – Server generates a random sequence number, sets the SYN and ACK bits
- ACK – Client responds
Immediately after the client sends the ACK it can start sending its HTTP request to the server, so, as you can see the connection setup time will be at least two times the latency of the connection between client and server – or, 1RTT, end-to-end. Again, what is significant about this is the speed at which we can do this has nothing at all to do the bandwidth of the connection or the weight of the page. It does not matter if we have 1Kbit or 1Gbit. We are only exchanging three packets and the latency is the only factor, which effects the performance of this interaction. So, before we can start fetching a page, we must wait for 1RTT of time purely for connection setup.
An individual page will be made up of many assets from multiple sources. Browsers these days will make multiple connections to each domain in order to fetch content in parallel where possible. However, we still need to pay the connection setup penalty for each unique domain that the page content is hosted on.
TCP Slow Start and Congestion Control
Once a TCP connection has been established, neither client nor server have any way of knowing the bandwidth available along the route between them. In any case, even if they did know the bandwidth they respectively had, the maximum bandwidth available is that of the slowest intermediate hop. This is where TCP slow start comes in. It is a strategy to prevent too much data from being injected into the network. For each segment that the receiver sees, it sends an acknowledgement back to the sender. The sender maintains a congestion window (CWND). The CWND is a count of the segments allowed in-flight at any one time which are unacknowledged. After connection setup, the CWND is set to, unsurprisingly, the Initial Congestion Window size (INITCWND). This is a fixed value for all new connections. The value varies based on the OS, but for 2.6 kernels it is 3. For each acknowledgement that the sender gets from the receiver, the CWND is incremented by one. This way, the CWND doubles for each round of segments sent. So, slow start is a bit of a misnomer as the ramp up is actually pretty rapid.
The CWND cannot grow forever, and eventually a buffer will fill up somewhere and a segment will be dropped. This will leave a segment unacknowledged. When the sender sees that a segment has been dropped, it assumes that the cause was congestion. There can be many reasons for the segment to be dropped, but it is reasonable to assume that it is due to congestion. The sender then shrinks the CWND so as not to further stress the network. This is the end of the slow start phase of the connection and the beginning of the congestion avoidance phase. During congestion avoidance, the CWND is again incremented, only this time linearly instead of exponentially.
So, what does this all mean for latency? As you can see, the sender has to probe the network to assess how much bandwidth is available. For new connections, we cannot immediately use all of the available bandwidth. Suppose we want to download a 40k file over a 5Mbit connection which has 300ms of latency. Let’s also say our MSS is 1460 bytes. If you were to look at it naively, you would think it should only take 600ms to download the file. 1RTT for connection setup and 1RTT to fetch the file. There is plenty of capacity to send all 29 segments at once. However, if we factor in slow start, it would go something like this (from the server’s perspective):
- 1st RTT three way handshake
- 2nd RTT receive GET request, CWND = 3, send 3 segments = 4,380 bytes
- 3rd RTT receive 3 acks, CWND = 6, send 6 segments = 8,760 bytes (total 13,140)
- 4th RTT receive 6 acks, CWND = 12, send 12 segments = 17,520 bytes (total 30,660)
- 5th RTT receive 12 acks, CWND = 24, send 8 segments = 10,300 bytes (total 40,960)
The transfer actually takes five RTTs. That is, 5 * 300ms = 1,500ms.
Mike Belshe, at Google, did an interesting study. They looked at page load times, for real sites, as a function of bandwidth and as a function of latency. What they found was that, between 1MBit and 5MBit the page load time dropped rapidly as bandwidth went up. However, once you pass the 5MBit mark, the returns are small. In fact, at 10MBits, we tend to only use around 16% of the bandwidth available. If we look at page load time as a function of latency, though, load times drop linearly as the latency drops. Put another way, reducing latency will always give us a proportional drop in page load time whereas bandwidth only helps us to a point.
One other thing to note is that, with TCP slow start, as soon as the sender sees a dropped packet, it flips from exponential growth (slow start) of the CWND to linear growth (congestion avoidance). The assumption is that the packet was dropped due to congestion. However, that may not always be the case. A packet due to a transient error could cause us to prematurely flip to congestion avoidance, thereby restricting the amount of bandwidth we can use to less than that which is available. Radio media are much more susceptible to random errors and interference. I wonder how often bandwidth is unnecessarily restricted due to a slightly noisy network.
A Note On $request_time
When we want to monitor and when we want to optimise, the tendency is to focus on the server side. How long are the requests taking? Make the code faster, increase the number of workers, allocate more memory – whatever is needed. You might call this the content generation time. It is important to remember, though, that this is not the page load speed that the client sees. Theo Schlossnagle wrote a great post about this and put it better than I ever could. The web server can only account for the time between when the HTTP request is received and when the last byte of the response is sent. There is a little latency before this (between the client sending the request and the server receiving it) and at the end (between the last byte being sent by the server and being received by the client).
The time recorded by Apache or NGINX is not the time that the page took to load from the users’ perspective. This is the definition of $request_time from the NGINX docs:
time elapsed between when the first bytes were read from the client
and the log write after the last bytes were sent to the client
So, you are measuring the time that it took on the server side to deal with this request. This does not include the delay that might have happened due to a DNS lookup and it does not include the time taken for a TCP handshake. $request_time is not request time from the user’ perspective.
What Can You Do?
Latency has a big effect on page load times. What can we do about it?
1. Send less
RTT makes a huge difference to site load time. Given the bursty nature of HTTP requests, it is hard for us to get up to the maximum bandwidth available to a device due to TCP slow start. So, make fewer RTTs by making your pages smaller. This could be by reducing the actual amount of content that you send to the device (i.e., make your pages smaller or send less to mobile devices). Another option is to compress parts of the site (e.g., css, js, etc.). There may be a CPU tradeoff here though. Not only will your page load faster, but it will be cheaper for your users.
2. Raise INITCWND
Make sure you are using a recent kernel. In newer kernels (>2.6.39), INITCWND is raised from 3 to 10 . If you run through the slow start exercise again, but with an INITCWND of 10, downloading the 40k file only takes 900ms (3x RTT) compared to 1500ms when the INITCWND is 3.
3. Put the data closer to the user
Even if we make the cables shorter or faster, at best, we are only going to get a marginal performance improvement. So, put the data closer to the user. Use a CDN to serve the less variable parts of your site.
One thing to watch out for is the cost of an uncached request when using a CDN. The client will have to wait for one RTT (of latency between client and CDN) for connection setup to the CDN. The CDN will then have to connect back to the origin (your server), which will cost you another RTT (of latency between the CDN and your server). So, there is a double TCP handshake penalty for uncached items.
4. Slow start restart
When a TCP connection is idle, the CWND shrinks quite aggressively. This means that after a very short idle time, we could end up effectively going through slow start again. Normal congestion avoidance should be enough to deal with any congestion that might have cropped up while a connection has gone idle. You can disable this behaviour by setting:
net.ipv4.tcp_slow_start_after_idle = 0
5. TCP Fast Open
TFO is an enhancement to help accelerate the three way handshake for clients, which have recently connected to a server. When setting up a connection for the first time, the server sends a cookie to the client. The client can send this cookie to the server when setting up subsequent connections. If the cookie is correct, the server will start sending data back to the client immediately after the SYN/ACK. The client must still send the final ACK back to the server, but it can happen at the same time as data transmission, thereby cutting down the delay caused by the three way handshake. The setting for this is:
6. Connection re-use
TCP connections are expensive to set up, so it makes sense to hang on to them and re-use them. This is commonplace practice now, but you should look at the timeout for these connections on the server side. For example, in Apache, the KeepAliveTimeout setting defaults to 5 seconds. This is quite low, but then you have to balance out the performance for the end user and the overhead these persistent connections have on the server side.
Latency matters. It arguably has a bigger impact on the loading time of sites than bandwidth does. This is especially true as links faster than 5Mbit become more common. When we make changes, we need to ensure that we’re measuring the right thing.
If you’re interested in this area, then Ilya Grigorik’s book “High Performance Browser Networking” is a great read, and, what’s more, it’s available to read for free online. If you prefer video, check out this talk and these accompanying slides.