Ben White IRL: 2011

Problem:
When creating a web application for mobile devices, there are 2 additional things everyone has to deal with; low throughput and high latency. How can you reduce the size of the code and the number of HTTP requests at the same time?

Solution:
By combining a couple great concepts together, I have created a single solution that can reduce file sizes by 50-80% and reduce HTTP calls down to the bare minimum; 1 HTML file and 1 PNG data file. Sound too good to be true? Here are the steps I have put together to achieve the desired outcome.

Merge & Compress JavaScript Files
Merge & Compress CSS + Image Files
Merge & Compress JS + CSS into Single PNG File
Extract Original Assets from PNG File in Browser

I have written a relatively small bit of JavaScript code (3 kB) for decoding the PNG file on the browser. This ties the whole thing together, creating an extremely lightweight mechanism for transporting all required assets from the server to the client.

Process Overview

Lets take a look at each step in this process...

Assumptions:

The client browser must support the <CANVAS> tag (For most mobile devices like iPhone, iPad and Android devices, this is not a problem.)
Only image files referenced in your CSS will be compiled in the build process. Images referenced by an <IMG> tag will still work, but will not be included in this single file solution.
A batch file will be used to build and execute steps 1-3.
We will use this simple folder structure for this project.

Folder Structure

Step 1: Merge & Compress JavaScript Files

Step 1 - JavaScript

Most well written applications will have many JavaScript files that have been logically created for ease of support and maintenance.
So, the first thing we need to do is combine all of these JS files into a single file called dev.js. This file is not compressed or minified in any way, which makes it ideal when debugging your application.

The next thing we need to take care of is applying the actual minification that optimizes the code for production use. I have chosen to use YUI Compressor to take care of this, but there are many other great tools that could be used. (JSMin or Packer to name a couple)
* It's very important that you are following JavaScript syntax best practices to ensure there are no issues introduced in the minification process. See Douglas Crockford's JSLint tool for more details.

Here is the BAT file code I'm using to accomplish the tasks involved in this step.

::Merge all JS files
ECHO. > dev.js
FOR /F %%v IN ('dir js\*.js /b') DO (
 type js\%%v >> dev.js
)

::Minify JS file
java -jar tools\yuicompressor-2.4.6.jar --preserve-semi --type js dev.js > prod.js

Step 2: Merge & Compress CSS + Image Files

Step 2 - CSS

A common practice used by advanced web developers for minimizing browser HTTP requests is called CSS sprites. With CSSEmbed We can accomplish the same desired outcome without all the CSS image mapping difficulties. This step in the build process makes CSS Sprites a thing of the past by embedding the images directly into the CSS.

Similar to the JavaScript files above we merge all the CSS files into a single dev.css file. Once we have this, we then embed the images using CSSEmbed and compress the resulting file with YUI Compressor to create out prod.css file.

Here is the BAT file code I'm using to accomplish the tasks involved in this step.

::Merge all CSS files
ECHO. > dev.css
FOR /F %%v IN ('dir css\*.css /b') DO (
 type css\%%v >> dev.css
)

::Embed images in CSS file
java -jar tools\cssembed-0.4.0.jar dev.css > cmp.css

::Minify CSS file
java -jar tools\yuicompressor-2.4.6.jar --type css cmp.css > prod.css

::Cleanup temp file
del cmp.css

Step 3 - Merge & Compress JS + CSS into Single PNG File

Step 3 - Create Data PNG

This technique was first introduced back in 2008 by Jacob Seidelin, and it's where all the magic happens. I have written a small .NET application that embeds multiple JS, CSS, HTML and XML files it into a single PNG file. To further optimize the image create, I'm using a great tool called PNGCrush to squeeze every last byte of of the data image.

Here is the BAT file code I'm using to accomplish the tasks involved in this step.

::Prep CSS data
ECHO ^<File2PNG type="text/css"^> > data.dat
type prod.css >> data.dat
ECHO ^</File2PNG^> >> data.dat

::Prep JS data
ECHO ^<File2PNG type="text/javascript"^> >> data.dat
type prod.js >> data.dat
ECHO ^</File2PNG^> >> data.dat

::Generate PNG file
tools\file2png data.dat data.png >> NUL

::Optimize PNG file
tools\pngcrush -rem alla -c 0 -q data.png prod.png

::Cleanup temp files
del data.dat
del data.png

Step 4 - Extract Original Assets from PNG File in Browser

Step 4 - Extraction

Now that we have all the server assets compressed into a single PNG file, and optimized for HTTP delivery, we need to figure out how to extract the data in the browser. I have written a small bit of code that can be used to dynamically load the data found in the PNG file, and gracefully fail back to support devices that don't properly implement the <canvas> tag.

Here is an example of how you can gracefully load data in the browser using the file2png code that I have created.

<script>
file2png.onInit = function () {

 var rev = '2011-10-21',
     planB = function () {
      file2png.loadStyle('prod.css?v=' + rev);
      file2png.loadScript('prod.js?v=' + rev);
     };

 if (file2png.supported) {
  file2png.loadData('prod.png?v=' + rev, function (obj) {
   if (obj.error) {
    planB();
   }
  });
 } else {
  planB();
 }
};
</script>

Download:

I have created a sample application that uses this process and I have packaged up the code used to compile this application. Additionally I have added some error detection into the batch file for good measure. Feel free to play with this and provide any feedback as I'm always excited to improve this process.

What has changed in Web 2.0 ?
One of the major changes in Web 2.0 application is the volume of code being pushed to the browser for execution. Previously, everything ran on the server and only code required for rendering a snapshot of the user interface was sent to the browser. With the upsurge in Web 2.0 sites like facebook, google maps, gmail, etc... users have grown to expect a rich user experience when surfing the web. This requires the browser to contain all the code required to not only render the user interface, but build it and dynamically interact with the user.

How are page load times impacted?
The initial load time should increase as there are more HTTP requests to be placed and more data to download. So how do we address this? Ideally, a client would only download a file one time. If these assets are ever updated, the client would magically recognize the file on the server has changed, and they would request the new file.

Doesn't the HTTP protocol take care of this for me?
The HTTP protocol has some wonderful features built in to deal with this. Here is a great Caching Tutorial by Mark Nottingham that covers HTTP caching in manageable terms. There are a couple problems with using pure HTTP caching techniques.
Scenario 1: The server directs the client to use a particular resource for x amount of time. (Expires: Fri, 30 Oct 1998 14:19:41 GMT)
In this case we run the risk of having issues when new code is released. The client blindly uses the cached code until the expiration time is met. This cached code might not work properly with updates to services or other bits of code that do not have the same expiration date.
Scenario 2: The server tells the client when a file was last modified (Last-modified : Mon, 21 Nov 2008 01:03:33 GMT)
In this case the client will ask the server if the file has changed every time it's requested in a page. The file will not be downloaded unless there is a change. However, the additional communications with the server will cause unnecessary delays in your page loading, not to mention the additional server and network overhead.

Can't we just rename the files?
Many people simply use a large Expires header value + change the name of their files when a file or resource changes, but this introduces its own issues. Let's take source control into account. It would be difficult to manage our code if our file names keep changing.

So what should we do?

Using a large Expires header + renaming files is very close to the correct solution, it's just implemented incorrectly. Rather than change the actual name of the file, we just request the file in a different way.
Example: <script src=“js/myCode.js?v=2010.02.17"></script>
When a new version of the code is release, we will call it using a new querystring value
Example: <script src=“js/myCode.js?v=2011.01.31"></script>
Because the browser cache uses the URL's path + its query-string, the previously cached URL will be ignored and the new URL will be requested from the server. This technique allows us to easily version control our client side code, but also allows us to introduce new code at any time without concern for whether or not the client cache will cause a problem.

Things to recognize:
First, we must update every place we reference a cached file when a new version is introduced. This should not be too difficult as most Web 2.0 applications have only a few HTML pages, but it must be stated.
Second, this does not apply to HTML files. HTML files should only use the Last-modified HTTP caching header as any update to these files should be immediately visible to the browser.
Third, this technique can be used anywhere you reference a common resource.

<link rel="STYLESHEET" type="text/css" href=“css/myStyle.css?v=2009.08.23" /> 

background-image: url(../img/myIcons.gif?v=2009.11.24);

You get the idea...

Happy caching

Ben White IRL

Wednesday, November 2, 2011

Optimize Web Apps with PNG

Monday, January 31, 2011

Caching Techniques for Client Side Web Code