Philip Jacob

Cost of Optimization is Greater than the Cost of Synchronization

· Philip Jacob

Prof. Bill Pugh is a guy I’ve been paying attention to lately. He’s on the expert group for JSR-133, which describes the new memory model and thread specification in JDK 1.5, which is obviously of interest to someone working on a high-throughput java object cache; he also is responsible for one of my favorite Eclipse plugins, FindBugs.

If you start reading about thread synchronization in Java, you will quickly bump into his paper entitled “Is Code Optimization Relevant?”, in which he humorously notes:

Cost of stupidity higher than cost of synchronization

Generally, I think that this is true, mainly because people don’t seem to understand synchronization very well, based on what I’ve been seeing in various codebases lately. I’m speculating that the unintended result of this claim in Bill Pugh’s presentation is that some developers just toss up their hands and say “The cost of synchronization is higher than the cost of stupidity, so therefore when I’m programming for a multithreaded application, I’ll just synchronize all the methods in certain classes!”.

In many cases, this works out to be a not-entirely-unreasonable approach, especially when you’re working under time pressure or cost pressure and just need to ship something that functions properly. However, when you’re writing components that are going to be used by lots of people in various applications, you have to approach the problem differently.

Case in point: Whirlycache creates a little Item object every time an object is inserted into the cache, basically like this:

public void store(Object storeme) {
    new Item(storeme);
}

When the object is removed from the cache, the Item object’s refcount goes to zero and it becomes eligible for garbage collection by the jvm. This is not a desirable behaviour, because the constant creation of transient objects causes heap fragmentation as well as causing the garbage collector to run more frequently than it otherwise would.

So, thinking I was smart, I thought I would just store these Item objects in an object pool, specifically one based on Commons Pool. I figured that I’d write a little quickie test to see how well things were performing first, though. Here’s the tradeoff:

  1. Doing lots of new Item(), thereby causing heap fragmentation and excessive garbage collection versus
  2. Whatever the overhead is in speed for using an object pool

My tests were kind of amazing, actually. If there is one thing that can be said for new Object(), it’s that it is fast. 20 threads can create 1M Item objects in 2113ms (this involves no synchronization at all). Using the GenericObjectPool from Commons Pool, the number jumped to 195,661ms! In other words, that’s a 100x decrease in speed for the tradeoff. Hmm… doesn’t sound so appealing. After examining Commons Pool a bit more, I see why it’s so slow: lotsa synchronization. I’m not sure how much of it is avoidable.

Keep in mind that if you are doing general applications, you ought to be using Commons Pool. From what I’ve been able to determine in the short time I’ve been examining the cvs repository, it is maintained, well-designed and flexible. This is not an indictment of Commons Pool in any way. Please consider using this package the next time you need an object pool.

However, my motivation for an object pool is specifically for use in Whirlycache. Now, the slower a cache becomes, the less useful it is. My experience building the backend for Whirlycache led me to Doug Lea’s wonderful util.concurrent package, where my eyes were really opened to some super techniques for writing code designed for use in a multithreaded context. Using some of the ideas I had seen there, I decided to see if I could do better than Commons Pool, which wasn’t very hard, but it did involve focusing entirely on the cost of synchronization. Here’s the general approach:


public Item acquire() {
        int i = possiblyAvailableIndex;
        while (!elements[i].lock()) {
            if (i == size) {
                i = 0;
            } else {
                i = random.nextInt(size);
            }
        }
        return elements[i];
    }

    public void release(final Item _item) {
        _item.unlock();
        if (_item.getQueuePosition() > 0) {
            possiblyAvailableIndex = _item.getQueuePosition();
        }
    }

I introduced a granular locking object in the Item() class that the lock() and unlock() methods synchronize against along with a field that stores the Item object’s position in the array.

Any call to acquire() basically zooms through an array of Item objects trying to get a lock on one. If the lock suceeds, the object is returned. The starting point for iteration through the array is set by the release() method, which basically tells all threads searching for a lock that one is available at elements[possiblyAvailableIndex]. If that lock isn’t available, rather than have 20 threads sequentially chasing each other around an array, it’s actually faster to have them start their searches at random locations. This approach runs my test case in 32,638ms, approximately 6x faster than using Commons Pool but still 15x slower than doing new Item().

Another point to emphasize here is that my object pool is no longer generic, since it involves tightly coupling the Item class to the pool that they will be served from. But it seemed like an interesting technique, although I’m not quite certain that it is fast enough to tip the balance in the tradeoff one way or another.

All told, avoiding the simple approach of just going with a highly synchronized pool would have been sufficient for most applications. Using more granular synchronization locks reduces contention and increases speed, but it takes quite a bit of time to analyze how and where you can put these locks. The cost of optimization is high when measured against the benefits received from the optimization, when measured in developer time.

My search continues. There must be a way to do this.