taylorbarstow.com

simple_pagination

I just pushed a new project to github: simple_pagination. It is a dead simple, standalone pagination library.

Some might say I’m reinventing the wheel, but I don’t see it that way. I realize there are some great pagination libraries for Rails/ActiveRecord (will_paginate), and I’m not trying to compete with or replace them. Rather, I see simple_pagination as a compliment to these projects. It is useful when you aren’t using ActiveRecord but you have data to page through… such as when you are consuming search results from a remote API.

(In fact, if mislav wanted to, he could refactor will_paginate to use simple_pagination. I doubt he wants to though. I know I wouldn’t want to.)

I hope other people find this useful—I know I’m tired of writing pagination logic when dealing with non-ActiveRecord data, which I am doing more and more of. Anyway, enjoy!

Test Awareness Month

It’s Test Awareness Month folks, and I guess I better do my part. For one thing, let me echo Bryan Liles and say that you should Test All The Fucking Time. I’m not going to write a big monologue on testing (I love it, and you should too), but here’s a super useful link if you are wondering how to go about testing Rails apps (it’s not always obvious; how should one test active record relationships, for example?).

Happy testing!

The Adobe AIR Socket Implementation Sucks

AIR sockets write data asynchronously—that is, the data you send to the socket goes into a buffer and the socket implementation decides when to actually write it out. Problem is, there is no way of knowing how much data has been written vs. how much is still in the buffer—no event to subscribe to, no bytesWritten property. There is a flush method on the socket, but it returns immediately without blocking so you can’t use this to force a synchronous write. (Since AIR doesn’t support multiple threads yet, writing synchronously isn’t really an option anyway.)

WTF Adobe? This is a glaring omission. Adrian raises an interesting question: did they actually do this on purpose? How hard could this possibly be?! I mean, they are simply exposing an underlying socket implementation provided by the OS, right? And every socket implementation on the block can tell you how much data is in the stupid buffer. grrrr.

Someone posted this as a bug on AIR’s issue tracker back in April. They didn’t get a response until August. Want to know what it said? “This is an enhancement request, i’ve logged it internally.” Sweet guys, thanks for the update.

Alas, it’s back to platform shopping for me. I hope Adobe can figure this out. Not only is it annoying, it makes them look downright stupid. Oh well.

Daemons in Ruby; or, Frustration

Over at unbiasd, we use a couple of daemons to keep our web interface snappy. Rather than letting our mongrels sit around and handle long running tasks (such as outbound web requests), we offload these responsibilities using the delayed_job plugin. Currently we’re using collectiveidea’s branch because it includes a nifty script to run delayed jobs in a daemon environment using the daemon generator plugin.

The bad news? The daemon crashes all the time. Sure, I could write a monitor daemon to restart it when it crashes (daemon generator even includes a prepackaged one of these). But that seems somehow wrong—what if my monitor starts crashing? Write another monitor? What if that starts crashing? In programming I believe in finding and solving the real problem rather than ignoring it and programming around it. The former is akin to an elegant solution, the latter to brute force.

So how do I proceed? Step 1 is clearly to remove all third party code—or at least to become intimately familiar with it. Ideally I’d like to keep using delayed_job, so I’m going to whittle away at the other third party code first. This means getting rid of my daemon generator crutch. This has some obvious advantages:

  1. I can use tobi’s delayed_job, which is clearly more up to date than collectiveidea’s
  2. I can learn the ins and outs of writing daemons in Ruby
  3. Hopefully I can write a good daemon and submit it back to tobi (with the magic of github)

So this is an introductory post—a prelude to a journey. More to come.

URL Canonicalization

Over at unbiasd, we have URL canonicalization problem. You see, unbiasd brings together a bunch of news feeds from various websites and publishes selected articles from these feeds on the homepage. For each recently published article, we use Ice Rocket to search for back links to the article. This is how we power our “blog reactions” feature.

This works great—except for one problem. The article URLs given in feeds are not always the same URLs that bloggers would be linking to. FeedBurner is an obvious example of this. While some bloggers might mistakenly link to a FeedBurner URL, they probably meant to link to the actual article’s permalink instead.

The solution (in my mind) was simple: do a HEAD request on each URL before doing a search for blog reactions. If the HEAD responds with a Location header, then use the referenced URL rather than the original.

This works great for FeedBurner URLs, but there is a problem that I just discovered yesterday. Some sites (including The New York Times) will sometimes redirect you to a login/registration URL before allowing you to view an article. This obviously throws a little hitch into my giddy up: my HEAD request responds with a Location header pointing to a login page, so I go ahead and do a blog search using the login URL. Sweet.

The more that I deal with real-world web content, the more I realize that it is the wild west of data. Pretty much nothing is normalized. Wild hacks are the norm, because sometimes they are the only thing that work. If nothing else, at least this will force me to be seriously pragmatic in my day to day coding.