Philip Jacob

How to organize your CDN hostnames

· Philip Jacob

I’ve used the following scheme to manage my hostnames on CDNs for the past few years and I find that it is particularly clean and easy to work with.  While the general scheme I propose here has no ties to any software platform, framework or CDN, I think it would be quite cool if Web framework designers built in support for this.  That being said, I’ve done this on two Java sites and one Rails site using a range of CDNs from Akamai to Amazon.  This is simple, so follow along.

All of your CDN hostnames will follow this scheme:

type-serial.environment.something.tld

The components are:

  1. type: typical values include jscss, product-image, avatar.  Basically, you put a whole class of content on one hostname.  Sometimes these divisions are artificial, based on performance or based on architecture.  For example, you may keep your product images (if you run an e-commerce site) inside a big S3 bucket that you want to front with a CDN.  You may want your Javascript and CSS served off of one host.  Anyway, this is your chance to make functional groupings.
  2. serial: You start with 0.  If you have a website that tends to present many images on one page, it can be beneficial to serve the same images off of several hostnames for performance reasons.  It’s also useful to have a serial field in case you migrate from one CDN provider to another since you can just bump up the serial number during the migration.  These are nuances, so I will come back to this shortly.  But if you run a small site, the value for ‘serial’ is 0.
  3. environment: typical values correspond to dev, integration, qa, staging or prod.  You will have your own names for these; obviously, you will want to use your own terminology.
  4. something.tld: generally, it is a good idea to serve your CDN accelerated assets from a domain name that is different to your main website.  For example, if your site is www.something.com, you should buy another domain like something-static.net.  There are a few reasons for this, but generally you neither need nor want your HTTP cookies being sent to your CDN hosts because this is normally not necessary for serving up static files that don’t differ from one visitor to another.  There are also security benefits (in case a host on your CDN’s network gets cracked) and performance (unnecessary HTTP overhead sending useless cookies).

And that’s it.

When you put it all together, you might end up with something like this for your production hostnames (I’ll use this domain, whirlycott.com as the example site):

jscss-0.prod.whirlycdn.net
avatars-0.prod.whirlycdn.net
blog-images-0.prod.whirlycdn.net

Your dev, qa and staging hostnames are easy to guess from this scheme, so I shall avoid repeating them.

I mentioned a nuance in relation to the serial number field.  If you find yourself in a position where you are generating web pages that have lots of, say, images, you can split up your content across multiple hostnames quite easily:

product-images-0.prod.foocdn.net
product-images-1.prod.foocdn.net
product-images-2.prod.foocdn.net

The advantage here is that your browser will typically download from multiple hostnames faster than from a single hostname (don’t go crazy with this and generate a hundred hostnames).  Of course, you do incur an extra DNS lookup, so you have to consider that.  When you are generating the serials for your assets, I recommend generating the same serial number (and therefore a consistent hostname) for a given piece of content.  If you have a website with pictures of butterflies, you might have a bunch of jpegs served like this  (note the alternating serial numbers):

http://animal-pics-0.prod.mycdn.net/blue-butterfly.jpg
http://animal-pics-1.prod.mycdn.net/green-butterfly.jpg
http://animal-pics-0.prod.mycdn.net/red-butterfly.jpg
http://animal-pics-1.prod.mycdn.net/yellow-butterfly.jpg

If you have fifty photos per page on your site, you should ideally generate the same hostname for each image to improve cacheability (there may also be some tangential benefits for Google image search).  Let’s say you want browsers to download animal-pics from two hostnames.  In this case, use a standard hash/mod approach to generate a gaussian distribution of your assets across your two hostnames.  Note that you will need to do this server-side.  In python, you’d do it like this:

>>> import hashlib
>>> hash = hashlib.sha1()
>>> hash.update("blue-butterfly.jpg")
>>> result = hash.hexdigest()
>>> result
'775da2f0b764b712b7c3615f479794e0095cc8ce'
>>> serial = int(result, 16) % 2
>>> serial
0L
>>>

SHA1 returns a 160-bit integer.  Python will handle large numbers for you automatically.  In Java, you have to use a BigInteger and DigestUtils to coax it into something you can do actual math with.  In this case, for the blue-butterfly.jpg, the correct serial is 0.  If you repeat this test on your python repl using “green-butterfly.jpg”, you will notice that the serial number is 1.

What I like about this layout is that it scales well, is easy to understand, easy to debug and simple to implement.  You do, however, end up with a proliferation of hostnames, but if you are successful, you will want something closely resembling this, so take the extra hour to set up your site the right way.  I like to avoid doing the same thing twice.