I can’t live without my news aggregator, Bloglines. That’s not to say that Bloglines has some mojo that other aggregators like FeedDemon, NetNewsWire, AmphetaDesk, or even My Yahoo! don’t have, Bloglines is just the right fit for me. I think I would be more upset if Bloglines were gone for 24 hours than if my email were down for 24 hours. A lot of people seem to agree with me, because Wired News is asking Will RSS Readers Clog the Web? Luckily, this is a somewhat solved problem. (Note, for this post I’ll be referring to any syndication format, be it Atom or one of the 7 or so RSS formats as RSS. Deal.)
The first solution is the current set of RSS generators. Take Movable Type for example, it generates an RSS feed for the site and then saves it as a regular file. The beauty of this approach (as opposed to, say, generating the file dynamically) is that Apache or IIS or whatever is pretty good at letting user agents know what they can cache. If a user agent says “don’t send this unless it’s been changed in the last hour” Apache won’t.
Of course, that implies that user agents are smart enough to say that. Atom aggregator behavior outlines some very important tech specs that RSS readers should support. RSS also has useful information about how often it is updated, which an aggregator should take into account. The Perl XML::RSS::TimingBot class is a pretty good example of using information from both the server and the client to save bandwidth.
Unfortunately there will always be bad eggs that don’t respect common services, which can be a major burden on network transfer. A service like Feedburner seems like a good line of defense. You tell it the URL of your RSS feed, and then tell subscribers to get your updates from them. That way they can worry about user agents getting out of hand and you don’t feel the bandwidth crunch. Of course, then you’re relying on a third party service to handle this for you, but I haven’t seen any CGI or PHP scripts that will do this for you (yet!).
Rate limiting isn’t always a good idea, though. Take Slashdot for example, if you load their feed more than once every 30 minutes one too many times they ban you for 72 hours. Unfortunately they don’t make exceptions for sites that distribute their feed, so if (for example) Bloglines or LiveJournal loads the feed too often (or more likely, there’s a problem with their rate limiter) 1105 Bloglines users or 536 LiveJournal users load up desktop aggregators and start hammering Slashdot’s server. This isn’t a hypothetical situation, either. Bloglines is currently banned from Slashdot’s RSS feed, and so instead of sending out one RSS feed faster than their limit allows they’re sending out 1000 RSS feeds at the limit.
Shrook for OS X has a pretty interesting take on this. Shrook will occasionally load the feed directly from the site, but between loads it will check with a central database that Shrook reports updates to, so if an update happens between normal updates Shrook will find out and fetch the update. This is what weblogs.com was supposed to fix, but in my experience most webloggers don’t ping weblogs.com.
The upshot of all of this is that RSS is taking off. If we’re moving beyond the starry eyed PointCast RSS-will-save-the-world phase and into the holy-crap-we’ve-got-to-fix-this phase, that means the technology is actually taking hold. As far as I’m concerned, that’s a good thing.


2 responses to “Syndication Limits”

  1. One thing about LiveJournal’s feed usage is that they do one fetch per feed every hour or so, regardless of the number of subscribers. New items are just inserted as new journal entries in the pseudo-journal for that feed. Then, those people hammering it just deluge LJ.
    I’ve heard that Bloglines doesn’t do the same, which is a bit surprising to me. Pooling feed checks would seem to be one of a slew of benefits possible with a centrallized web-based feed aggregator…

  2. Earthstink and Time Warner Schmable.

    Well, it was nice while it lasted. Time Warner, who is selling Earthlink highspeed, showed up, on time, for their appointment Tuesday. Within minutes I was back. Back to my bloglines, which I agree with George as to it’s adquatulance(sic);…

Leave a Reply