Your Email works… Your Email Works… Your Email Works…

One interesting side effect of having highly accurate anti-spam filters is that I am frequently wondering if there is a problem with mail delivery on my servers. I’m flipping over to my inbox more and more only to discover that there are no new messages. The constant stream of spam also served as a nagging re-assurance that things were indeed working. Without this, how do I know if my filters are operating superbly or if the mail partition is full? weight gain cardizem

Or maybe my main contacts are just using AIM and IRC for everything…?

Whirlycache 0.6.5 Released

Whirlycache 0.6.5 is now available. Whirlycache is an extremely fast in-memory Java object cache.

New features for this release include:

  • Cocoon integration from Peter Royal (BETA): you can how use Whirlycache as a Store in Cocoon
  • Hibernate integration (BETA): You can now use Whirlycache as a Hibernate cache provider
  • A new Cacheable interface allows cached objects to hook in to lifecycle methods to provide customized functionality when store(), retrieve() and remove() events occur.

Get the 0.6.5 release at: http://whirlycache.dev.java.net/

Eclipse Goodie

I’m using this little Eclipse goodie every day since I discovered it earlier in the week. It’s a common occurrence that you have a java.util.Collection object reference and want to iterate over it. Check this out:

Eclipse auto-complete for iteration

In Eclipse, after the object ref was assigned to list, I hit ctrl-space and found a wonderful addition to the menu (iterate over collection) that sets up my code with a for() loop and populates everything for me.

Try it. You’ll use it all the time.

Love/Hate/qmail

I’ve been a qmail convert since I worked for pate in 1999 when it truly was the best option out there insofar as MTAs were concerned. Maybe it still is. I wish the licensing terms were compatible with the terms that my preferred distribution for Linux-based servers uses.

Qmail, for those of you who don’t know, is a mail server. It receives emails, usually from the network, and processes them by delivering them locally or passing them on to their ultimate destination on a remote host. It’s fast, secure and very stable. Almost no maintenance is required.

These days, however, you need more than just something to process mail: you usually need a virus scanner, anti-spam facilities, some kind of user database that doesn’t require system accounts, integration with IMAP and POP3 servers, support for mailing lists, TLS, SMTP-Auth, and flexibility at every corner for your own customizations.

But it strikes me that at the end of 2004, these needs ought to be commoditized such that a reasonably competent sysadmin can simply install all of the above using their distribution’s package manager. This shouldn’t be hard.

Instead, due to the Qmail license, the process requires a certain level of expertise, patience and time. Download qmail, vpopmail, courier, the netqmail patches, vqadmin, qmailadmin, qmail-scanner, ClamAV, and SpamAssassin. Verify compatibility between all of the above, usually by reading the notes buried in qmail source patches, docs and mailing lists for all of the above. Oh, and qmail.org is probably one of the most user-unfriendly websites in the world. They claim:

online zimulti

It’s not designed to be easy to use — it’s designed to be comprehensive.

… as if the two were mutually exclusive.

By any measure, the complexity and effort involved is a pain in the neck. And I don’t believe that it provides any real benefit over simply installing prebuilt binary packages.

Your other option is to go and find prebuilt packages made by someone who is in the position you are in. I’ve been down that road before and had to abandon that approach when I needed some kind of special patch to the qmail source.

I have to give a special nod to the traditional UNIX mentality here: do one thing and do it well. That’s what all of the aforementioned packages do. They each represent a best-in-class solution to a specific problem. Integrating them is the trick. It certainly is possible, but it seems like it could be a whole lot easier. I like modularity and loose coupling, but I wish the pieces fit together a little easier. One thing that would help in this regard is license-compatibility. If the qmail license was a traditional open-source licence, someone could customize it to integrate with various sundry components and turn out a click-and-run email server.

In spite of all the trouble that one must endure while putting together a setup like this, once it is running, it tends to stay running without intervention. If there is another integrated open-source stack that people use for email, I haven’t found it yet.

Cost of Optimization is Greater than the Cost of Synchronization

Prof. Bill Pugh is a guy I’ve been paying attention to lately. He’s on the expert group for JSR-133, which describes the new memory model and thread specification in JDK 1.5, which is obviously of interest to someone working on a high-throughput java object cache; he also is responsible for one of my favorite Eclipse plugins, FindBugs.

If you start reading about thread synchronization in Java, you will quickly bump into his paper entitled “Is Code Optimization Relevant?“, in which he humorously notes:

Cost of stupidity higher than cost of synchronization

Generally, I think that this is true, mainly because people don’t seem to understand synchronization very well, based on what I’ve been seeing in various codebases lately. I’m speculating that the unintended result of this claim in Bill Pugh’s presentation is that some developers just toss up their hands and say “The cost of synchronization is higher than the cost of stupidity, so therefore when I’m programming for a multithreaded application, I’ll just synchronize all the methods in certain classes!”.

In many cases, this works out to be a not-entirely-unreasonable approach, especially when you’re working under time pressure or cost pressure and just need to ship something that functions properly. However, when you’re writing components that are going to be used by lots of people in various applications, you have to approach the problem differently.

Case in point: Whirlycache creates a little Item object every time an object is inserted into the cache, basically like this:

public void store(Object storeme) {
    new Item(storeme);
}

When the object is removed from the cache, the Item object’s refcount goes to zero and it becomes eligible for garbage collection by the jvm. This is not a desirable behaviour, because the constant creation of transient objects causes heap fragmentation as well as causing the garbage collector to run more frequently than it otherwise would.

So, thinking I was smart, I thought I would just store these Item objects in an object pool, specifically one based on Commons Pool. I figured that I’d write a little quickie test to see how well things were performing first, though. Here’s the tradeoff:

  1. Doing lots of new Item(), thereby causing heap fragmentation and excessive garbage collection versus
  2. Whatever the overhead is in speed for using an object pool

My tests were kind of amazing, actually. If there is one thing that can be said for new Object(), it’s that it is fast. 20 threads can create 1M Item objects in 2113ms (this involves no synchronization at all). Using the GenericObjectPool from Commons Pool, the number jumped to 195,661ms! In other words, that’s a 100x decrease in speed for the tradeoff. Hmm… doesn’t sound so appealing. After examining Commons Pool a bit more, I see why it’s so slow: lotsa synchronization. I’m not sure how much of it is avoidable.

Keep in mind that if you are doing general applications, you ought to be using Commons Pool. From what I’ve been able to determine in the short time I’ve been examining the cvs repository, it is maintained, well-designed and flexible. This is not an indictment of Commons Pool in any way. Please consider using this package the next time you need an object pool.

However, my motivation for an object pool is specifically for use in Whirlycache. Now, the slower a cache becomes, the less useful it is. My experience building the backend for Whirlycache led me to Doug Lea’s wonderful util.concurrent package, where my eyes were really opened to some super techniques for writing code designed for use in a multithreaded context. Using some of the ideas I had seen there, I decided to see if I could do better than Commons Pool, which wasn’t very hard, but it did involve focusing entirely on the cost of synchronization. Here’s the general approach:

public Item acquire() {
        int i = possiblyAvailableIndex;
        while (!elements[i].lock()) {
            if (i == size) {
                i = 0;
            } else {
                i = random.nextInt(size);
            }
        }
        return elements[i];
    }
    public void release(final Item _item) {
        _item.unlock();
        if (_item.getQueuePosition() > 0) {
            possiblyAvailableIndex = _item.getQueuePosition();
        }
    }

I introduced a granular locking object in the Item() class that the lock() and unlock() methods synchronize against along with a field that stores the Item object’s position in the array.

Any call to acquire() basically zooms through an array of Item objects trying to get a lock on one. If the lock suceeds, the object is returned. The starting point for iteration through the array is set by the release() method, which basically tells all threads searching for a lock that one is available at elements[possiblyAvailableIndex]. If that lock isn’t available, rather than have 20 threads sequentially chasing each other around an array, it’s actually faster to have them start their searches at random locations. This approach runs my test case in 32,638ms, approximately 6x faster than using Commons Pool but still 15x slower than doing new Item().

Another point to emphasize here is that my object pool is no longer generic, since it involves tightly coupling the Item class to the pool that they will be served from. But it seemed like an interesting technique, although I’m not quite certain that it is fast enough to tip the balance in the tradeoff one way or another.

All told, avoiding the simple approach of just going with a highly synchronized pool would have been sufficient for most applications. Using more granular synchronization locks reduces contention and increases speed, but it takes quite a bit of time to analyze how and where you can put these locks. The cost of optimization is high when measured against the benefits received from the optimization, when measured in developer time.

My search continues. There must be a way to do this.

Domain Name Nirvana and Damn Spam

Recently, I purchased a new domain name (a .org) and was pleasantly surprised to see that it had made its way into the root nameservers within a few minutes. It’s really a great thing to be able to do this. I was pleased.

I set up the site running WordPress and went to bed. I woke up the next morning to lots of blog-spam for gambling sites. I was less than pleased.

glucophage

What puzzled me is that I told nobody about the website. This same scenario happened again this week, actually. Oddly enough, this blog gets some publicity and yet no comment spam.

It’s impressive how well and how quickly spammers can take an otherwise lovely technological experience and wreck it.