Monday, January 31, 2011

Caching Techniques for Client Side Web Code

What has changed in Web 2.0 ?
One of the major changes in Web 2.0 application is the volume of code being pushed to the browser for execution. Previously, everything ran on the server and only code required for rendering a snapshot of the user interface was sent to the browser. With the upsurge in Web 2.0 sites like facebook, google maps, gmail, etc... users have grown to expect a rich user experience when surfing the web. This requires the browser to contain all the code required to not only render the user interface, but build it and dynamically interact with the user.

How are page load times impacted?
The initial load time should increase as there are more HTTP requests to be placed and more data to download. So how do we address this? Ideally, a client would only download a file one time. If these assets are ever updated, the client would magically recognize the file on the server has changed, and they would request the new file.

Doesn't the HTTP protocol take care of this for me?
The HTTP protocol has some wonderful features built in to deal with this. Here is a great Caching Tutorial by Mark Nottingham that covers HTTP caching in manageable terms. There are a couple problems with using pure HTTP caching techniques.
Scenario 1: The server directs the client to use a particular resource for x amount of time. (Expires: Fri, 30 Oct 1998 14:19:41 GMT)
In this case we run the risk of having issues when new code is released. The client blindly uses the cached code until the expiration time is met. This cached code might not work properly with updates to services or other bits of code that do not have the same expiration date.
Scenario 2: The server tells the client when a file was last modified (Last-modified : Mon, 21 Nov 2008 01:03:33 GMT)
In this case the client will ask the server if the file has changed every time it's requested in a page. The file will not be downloaded unless there is a change. However, the additional communications with the server will cause unnecessary delays in your page loading, not to mention the additional server and network overhead.

Can't we just rename the files?
Many people simply use a large Expires header value + change the name of their files when a file or resource changes, but this introduces its own issues. Let's take source control into account. It would be difficult to manage our code if our file names keep changing.

So what should we do?
Using a large Expires header + renaming files is very close to the correct solution, it's just implemented incorrectly. Rather than change the actual name of the file, we just request the file in a different way.
Example: <script src=“js/myCode.js?v=2010.02.17"></script>
When a new version of the code is release, we will call it using a new querystring value
Example: <script src=“js/myCode.js?v=2011.01.31"></script>
Because the browser cache uses the URL's path + its query-string, the previously cached URL will be ignored and the new URL will be requested from the server. This technique allows us to easily version control our client side code, but also allows us to introduce new code at any time without concern for whether or not the client cache will cause a problem. 

Things to recognize:
First, we must update every place we reference a cached file when a new version is introduced. This should not be too difficult as most Web 2.0 applications have only a few HTML pages, but it must be stated.
Second, this does not apply to HTML files. HTML files should only use the Last-modified HTTP caching header as any update to these files should be immediately visible to the browser.
Third, this technique can be used anywhere you reference a common resource.
<link rel="STYLESHEET" type="text/css" href=“css/myStyle.css?v=2009.08.23" />
background-image: url(../img/myIcons.gif?v=2009.11.24);
You get the idea...
Happy caching