Pinterest and copyright

There’s a little flare-up going on over at Hacker News over a blog post about Pinterest’s TOS (dated March 29, 2011, which I note only in case it materially changes in the future).  Most of the comments on HN are infuriating because the staggering level of naivete of the top voted comments is greater than normal.

So Pinterest has a TOS doc.  People are starting to pay attention because Pinterest is getting really, really, really big.  The theme of many of the HN comments is that this is typical cover-your-ass boilerplate language and that it’s totally normal, so therefore just shut up and accept it.  This is nothing more than teenage peer pressure logic applied to what is possibly the hottest Internet startup in the country right now.

In other words, time to start focusing on it a bit more.

I actually care a lot about terms of service and how users interact with them.  And while I have lot of respect for what Pinterest has built (and have learned a few things from studying them), TOS and copyright are areas that we spent a lot of time working on at StyleFeeder in conjunction with our legal counsel and advisors (including one world-renowned expert in copyright law).  By contrast, here’s the relevant portion of StyleFeeder’s pre-acquisition TOS that I happen to have in an old file on my laptop.

This, people, is how to write a TOS that allows the business to function with flexibility and protection yet doesn’t overreach (bold text is mine).  

3. User-Posted Content.

StyleFeeder depends on the content that you post. In fact, that’s the whole point of the Site. While we encourage you to add links to great products and to post your profiles and reviews, some content just isn’t appropriate for the Site, including, but not limited to, links to illegal or counterfeit items or sexually-explicit, racist, or vulgar content. While we have no obligation to monitor use of the Site, we do reserve the right to review, modify and/or remove content, for example, content found offensive by other users or content found to be illegal.

StyleFeeder is not responsible for the manner or circumstances by which third parties may access such public content and is under no obligation to disable or otherwise restrict this access, although we reserve the right to do so when we deem appropriate. By posting such items, information, messages and comments in a public area, you are granting permission to us to use, display, modify, distribute and otherwise exploit such items, information, messages and comments in connection with the Site and otherwise in connection with our business.

9. Copyrights.

StyleFeeder-posted content included on the Site, such as text, graphics, logos, data compilations, APIs, software and the compilation of all content on the Site, is the property of StyleFeeder and its licensors, and is protected by United States and international copyright laws. StyleFeeder makes no claim to third-party content that is rightfully posted on the Site.

Notice that this language is markedly tighter than what Pinterest currently chooses to use.  I wish Pinterest put the same level of thought and innovation into their TOS as they did with their product.

PS Also note that imgur’s TOS doesn’t overreach.  IANAL so perhaps it’s not as good, but my reading of it is that it is philosophically very different from that of Pinterest.

How to organize your CDN hostnames

I’ve used the following scheme to manage my hostnames on CDNs for the past few years and I find that it is particularly clean and easy to work with.  While the general scheme I propose here has no ties to any software platform, framework or CDN, I think it would be quite cool if Web framework designers built in support for this.  That being said, I’ve done this on two Java sites and one Rails site using a range of CDNs from Akamai to Amazon.  This is simple, so follow along.

All of your CDN hostnames will follow this scheme:

type-serial.environment.something.tld

The components are:

  1. type: typical values include jscss, product-image, avatar.  Basically, you put a whole class of content on one hostname.  Sometimes these divisions are artificial, based on performance or based on architecture.  For example, you may keep your product images (if you run an e-commerce site) inside a big S3 bucket that you want to front with a CDN.  You may want your Javascript and CSS served off of one host.  Anyway, this is your chance to make functional groupings.
  2. serial: You start with 0.  If you have a website that tends to present many images on one page, it can be beneficial to serve the same images off of several hostnames for performance reasons.  It’s also useful to have a serial field in case you migrate from one CDN provider to another since you can just bump up the serial number during the migration.  These are nuances, so I will come back to this shortly.  But if you run a small site, the value for ‘serial’ is 0.
  3. environment: typical values correspond to dev, integration, qa, staging or prod.  You will have your own names for these; obviously, you will want to use your own terminology.
  4. something.tld: generally, it is a good idea to serve your CDN accelerated assets from a domain name that is different to your main website.  For example, if your site is www.something.com, you should buy another domain like something-static.net.  There are a few reasons for this, but generally you neither need nor want your HTTP cookies being sent to your CDN hosts because this is normally not necessary for serving up static files that don’t differ from one visitor to another.  There are also security benefits (in case a host on your CDN’s network gets cracked) and performance (unnecessary HTTP overhead sending useless cookies).

And that’s it.

When you put it all together, you might end up with something like this for your production hostnames (I’ll use this domain, whirlycott.com as the example site):

jscss-0.prod.whirlycdn.net
avatars-0.prod.whirlycdn.net
blog-images-0.prod.whirlycdn.net

Your dev, qa and staging hostnames are easy to guess from this scheme, so I shall avoid repeating them.

I mentioned a nuance in relation to the serial number field.  If you find yourself in a position where you are generating web pages that have lots of, say, images, you can split up your content across multiple hostnames quite easily:

product-images-0.prod.foocdn.net
product-images-1.prod.foocdn.net
product-images-2.prod.foocdn.net

The advantage here is that your browser will typically download from multiple hostnames faster than from a single hostname (don’t go crazy with this and generate a hundred hostnames).  Of course, you do incur an extra DNS lookup, so you have to consider that.  When you are generating the serials for your assets, I recommend generating the same serial number (and therefore a consistent hostname) for a given piece of content.  If you have a website with pictures of butterflies, you might have a bunch of jpegs served like this  (note the alternating serial numbers):

http://animal-pics-0.prod.mycdn.net/blue-butterfly.jpg
http://animal-pics-1.prod.mycdn.net/green-butterfly.jpg
http://animal-pics-0.prod.mycdn.net/red-butterfly.jpg
http://animal-pics-1.prod.mycdn.net/yellow-butterfly.jpg

If you have fifty photos per page on your site, you should ideally generate the same hostname for each image to improve cacheability (there may also be some tangential benefits for Google image search).  Let’s say you want browsers to download animal-pics from two hostnames.  In this case, use a standard hash/mod approach to generate a gaussian distribution of your assets across your two hostnames.  Note that you will need to do this server-side.  In python, you’d do it like this:

>>> import hashlib
>>> hash = hashlib.sha1()
>>> hash.update("blue-butterfly.jpg")
>>> result = hash.hexdigest()
>>> result
'775da2f0b764b712b7c3615f479794e0095cc8ce'
>>> serial = int(result, 16) % 2
>>> serial
0L
>>>

SHA1 returns a 160-bit integer.  Python will handle large numbers for you automatically.  In Java, you have to use a BigInteger and DigestUtils to coax it into something you can do actual math with.  In this case, for the blue-butterfly.jpg, the correct serial is 0.  If you repeat this test on your python repl using “green-butterfly.jpg”, you will notice that the serial number is 1.

What I like about this layout is that it scales well, is easy to understand, easy to debug and simple to implement.  You do, however, end up with a proliferation of hostnames, but if you are successful, you will want something closely resembling this, so take the extra hour to set up your site the right way.  I like to avoid doing the same thing twice.

Some thoughts about Scrum

I was involved in a wee little exchange on ye olde Twitter social medium over the weekend with @dcancel and @pt in which I said I didn’t like certain aspects of Scrum (which, by the way, is a software development methodology).  I was asked to elaborate.

I think Scrum has a lot of good aspects.  I’ll go a step further and say that for most startups and probably most software projects, Scrum should be your default.  You should be required to make a case for not using it before moving to something else.  However, there are two effects of Scrum that I don’t like.

Most importantly, I think Scrum does have a chilling effect on innovation.  Common symptoms of this are people saying things like “Stick to what is in the sprint” or “That’s a super idea – put it in the backlog and let’s consider it during our next planning meeting.”  Innovation and creativity don’t respond well to statements like this.  They appear suddenly and without warning and are opportunities that you must seize and run with.  Damn your plans.  Scrum is part of a family called “agile methods,” and, by comparison, it absolutely is.  Well, maybe sometimes it just isn’t agile enough.  But I guess it depends on what you are optimizing for.

The second thing that I don’t like about Scrum is that if you have a highly functional team that is cranking, applying Scrum to the chemistry of your team will absolutely slow things down.  As lightweight as it is, there most certainly is overhead involved.  Perhaps there are other benefits of having Scrum in place, but speed isn’t one of them.  That being said, Scrum can be fairly lightweight and is probably the most responsible choice you can make in the face of actual methodologies that you can, say, buy books about.

Now, what do I like if it’s not Scrum?

I like goals.  The objective of a sprint (using Scrum parlance) is to complete the specified work.  Hopefully that maps to your overall strategy.  Hopefully that makes your goals a few steps closer than before.  The reality is that reaching your goals are the most important thing, not necessarily how they are achieved.  If you want to boost conversions, increase registrations, reduce latency, etc., it’s way better to stick with a few key numbers that you can measure against and simply chase those until you’ve moved whatever needle you are measuring.  It is very frequently the case that you will have no idea what will end up working for you in terms of actual tactics.  But the ability to make guesses, learn, retry and iterate is going to get you there.  Sticking to a plan and realizing halfway through a sprint that things are Not Going Well is not going to lead to the desired outcome.  And constantly changing the composition of a sprint makes the whole process seem very flimsy.

But Scrum is just a hammer in your toolkit.  Choose it for the right job and it can be very valuable (yes, really!).  If you adopt it during a phase in your company’s lifecycle when you are trying to focus on innovation, consider yourself warned.

Do you have techniques to make Scrum work better in an environment that requires innovative thinking put into practice?

 

Open source election software

Read this article about the announcement earlier this week of an open source election system that was made publicly available. Now read this wee little blog post about why this isn’t providing us much in the way of guarantees.

The open source nature of the code is helpful in the long run, but it provides absolutely nothing in the way of assurance to voters. Ben Adida’s Helios Voting System provides voters with a cryptographic, verifiable receipt that their vote was counted. Commercial implementations or Open Source versions of this software would both still need to provide a cryptographic receipt. That’s your proof. That’s something that can support the weight of democracy.

ID Selector terms of service

Every now and then, I get it into my head that I’m going to release OpenID support on StyleFeeder. In fact, I have the code mostly written, but there’s always some nit-picky aspect that doesn’t work as well as I want it to, so I leave the code aside and get on with my life.

Some time ago, I came across ID Selector from JanRain, one of a few important companies in the identity space. They have a little javascripty/css thingy that you can put on your site to help users choose from a list of popular OpenID providers and then type in their username. It’s neat and it sure beats typing in URLs.

One thing that you get to do as the founder of a venture funded startup is sign contacts. Woo, fun! Every time I sign up for something online now, I can’t help but read The Fine Print. So it was with great dismay that I read the TOS for ID Selector (see below for some delicious excerpts). The punchline is this: I can’t think of a better way to discourage people from using this cute little snip of javascript that any competent programmer could put together without material effort.

These terms of service, dear reader, are stupid because they are in nobody’s best interest. Site operators should have freedom to adjust the behavior of the widget code as necessary. JanRain should be focused on OpenID adoption, not trying to control their rights for the UI component.

Consistent behavior of the ID Selector across websites is important to ensure that users get what they expect on each usage. If you want an example of a model that works reasonably well this regard, look no further than the feed icons and the guidelines for their use that Mozilla put forth. Open. Easy. Flexible.

The identity space is moving slowly enough without unnecessary impediments like this.

But I don’t like to whine without proposing some solutions, so here’s where I’m going to stop and wait to see what happens:

  1. JanRain – please change your TOS to relax these unnecessary restrictions
  2. Also: release a standalone version of the ID Selector under some kind of an open license (or dual license) so sites that don’t want to have your code loaded in at runtime don’t have to
  3. If JanRain won’t do #2, I’m hereby offering on behalf of StyleFeeder, Inc. to fund someone to create a standalone ID Selector that will be released under better terms. Contact me if you want to be that person.

Some fun excerpts from the ID Selector TOS. No, I am not kidding.

3. Ownership rights. The IDSelector is owned by us and our licensors. The IDSelector is protected by copyright and other intellectual property laws and treaties. We and our licensors reserve all rights not specifically granted to you. You may not reverse engineer, decompile, or disassemble any aspect of IDSelector . You may not modify, adapt, or create derivative works from the IDSelector . Do not remove proprietary notices. Do not help any one else to do any of the things prohibited in this paragraph.

[…]

6. Your responsibilities. You must use the IDSelector web site to obtain an IDSelector tool and/or code located at idselector.com. You may not copy code from another web site to use the IDSelector.

[…]

7. Your rights to use the IDSelector . We offer you the following rights to use IDSelector provided that you continue to comply with the terms of this agreement. You may not remove, distort or alter any element of the IDSelector (including the HTML and JavaScript code).

No, that’s not why perl 5 is dying

Interesting, but perl 5 is dying because nobody created a good way for folks to build web applications with perl. CGI scripts? No, sorry. mod_perl? Died an ugly death somewhere between Apache 1.x and the new MPMs in Apache 2.x. All the interesting frameworks (i.e. Mason, embperl) depended on mod_perl. Those are all basically dying of upstream dehydration.

There’s no technical reason that a conceptual equivalent of Perl on Rails couldn’t have been created years ago. But the innovation shifted outside of the Perl community and those still left were too busy figuring out what was going on with perl 6 to help people figure out how to put a perl-based website together. And don’t get me started about threading.

Perl once was the duct tape of the Internet, but those days are long gone. It’s too hard to connect Perl to a scalable website anymore, so it’s pretty much over until someone can figure out how to change that. It doesn’t hold up next to the options available in PHP or modern Java… not by a long shot.

Perl still has all the good stuff: CPAN and the wildly rich collection of modules comes to mind. But I’m not sure that this can save Perl as a platform.

Sidenote: it’s a crying shame that all that great code in CPAN will end up being rewritten over the coming years. What a waste of time and effort. Just another reason why a multi-language VM is such an important part of a long term strategy.

So long, YottaMusic

I’ve enjoyed using YottaMusic ever since Jake told me about it last year. I signed on and became a paying Rhapsody customer but I used the YottaMusic frontend for streaming my music. I was wondering why YottaMusic shut down over Christmas and now I know.  If anybody knows of another service like it, I’d love to know. The Rhapsody.com website sucks rhino and I’m certainly not going to continue paying for that slow, buggy heap of junk.

Thanks for killing one of the best legit music experiences, Rhapsody. You’re losing at least one customer (me) over this nonsensical decision.