I found out that it’s surprisingly easy to make an RSS screen scraper with Template::Extract. I was able to whip up an RSS feed for ann arbor is overrated in about 15 minutes. This guide to Template::Extract and RSS had all the code I needed, although I modified it slightly to be a cron job instead of a CGI. Enjoy.


5 responses to “Ann Arbor is both overrated and RSSed”

  1. Andy says:

    Your link to the Template::Extract guide is misformed… “ref” instead of “href.”

  2. Thanks for spotting that, it’s fixed now.

  3. l.m.orchard says:

    If you dig XSLT, I’ve got another way to do it, mostly using XPath expressions on Tidy’d HTML:
    For most sites, I can usually throw together a scraper from a few paths in under 10 minutes.

  4. ann arbor is overrated says:

    Yeah, I’m often bad at responding to e-mail (I’m in the process of switching to a better webmail provider) and I’m also not sure how to set up an RSS feed with Diaryland. But I have comments on my blog now!
    I wasn’t sure what to think about this – then I remembered you’d asked about it and everything. I usually edit entries for grammar and wording about five times after they’re posted, so you might end up with different versions. Anyhow, I’m glad you like the site.

  5. George says:

    The comments look good, I think they’ll make your site even better. I did some research on setting up RSS on Diaryland but couldn’t find anything.
    I think most bloggers edit their entries a few times before deciding they like what they see. My script runs every 4 hours and builds an RSS feed off of what is currently on there. If you’re in the process of updating it when that happens, the updates will appear 4 hours later. I can change the 4 hour update if you want…
    You should also add this tag to your :
    <link rel=”alternate” type=”application/rss+xml” title=”RSS” href=”http://george.hotelling.net/rss/annarborsucks.rss”>
    to tell RSS readers where to find your RSS feed.

Leave a Reply