taylorbarstow.com

Hosting Review: Slicehost

I signed up for Slicehost today because I decided to move unbiasd over to their services. There were a few major reasons for the change (which may seem a bit drastic considering that we’ve only been with Rails Machine for a few months):

  • Cost we are saving $152/month, a 44% savings
  • Uptime this is based on word of mouth, but I’ve heard that Slicehost’s uptime is nearly impeccable. In our few months with Rails Machine, our VMs have gone down two or three times
  • Speed I was able to configure my slices so that I get a far larger CPU share than I did with Rails Machine
  • Confidence I started out with Rails Machine because I wanted to be able to fall back on their support team if I needed it (I am no expert when it comes to Rails deployment). It turns out that I have only used their support staff for routine stuff - restarting VMs that have gone down, for example - and not for anything specific to Rails. Plus, I am switching to Phusion Passenger, which makes deployment so simple a monkey could just about do it.

These are just my main reasons for switching. There are other benefits, too, like a simple automated backup system, and a great web interface for managing my slices.

Now, onto my review (admittedly, it may be a little early for a review, but at least I can review the signup and install process).

Signup

The signup form is streamlined and simple. You give them your billing information and tell them what kind of slice you want and which OS. From there, a slice is automatically provisioned. In my case, it took less that a minute to receive confirmation that my slice had been created. Ten seconds later I was logged in as root via ssh. This is by far the fastest and simplest signup process I have ever seen.

Lock Down

Since Slicehost gives you a VM with root access, it’s really up to you to make sure it’s secure. That was pretty easy since they give you a simple, easy to use guide describing how to do this. Obviously there are no cookie-cutter security situations, but they give you a great set of defaults that result in a pretty secure box.

Rails Stack

From there I needed to install the Rails stack - in my case, Apache, mod_rails, Ruby Enterprise Edition, MySQL, and Memcached. Slicehost provides guides for everything but Ruby Enterprise Edition and Memcached (and the guides are tailored to your OS, which is nice). REE and Memcached are really easy to install, so I didn’t miss guides for these very much at all.

Staging Environment

After setting up my production-box-to-be, I wanted to go ahead and set up my staging server. Although the production setup was really easy, it did take some time (four to six hours, I would say). This has nothing to do with Slicehost and everything to do with (a) my perfectionism, and (b) my slow, calculated approach to sysadmin tasks (which results from the fact that I am a programmer, not a sysadmin). The second time around would have been quicker (by far), but wouldn’t it be nice if I could just clone the box I had just setup instead?

Luckily Slicehost makes this super easy. First, I turned on backups for my production box (an extra $30/month, but something I wanted anyway). Then I took a snapshot, which took about 5 minutes to complete. Then I created a new VM, and selected my snapshot as the image (rather than a bare bones OS install). The build out took under a minute again. I logged in, tweaked some stuff specific to staging (changed the database name, for example, as well as Passenger’s RailsEnv setting), and everything worked!

So it took about 20 minutes to setup the staging environment.

Conclusions

So far, I am loving Slicehost! I can’t comment on uptime or customer service just yet, but so far this is the best experience I’ve had with setting up a VPS. Slicehost appears to be a best-of-breed hosting service, and I’m glad I found it. Thanks, Slicehost!

SEO Attacks

People have discovered many ways to disable, harm, or otherwise compromise websites and web services over the years. We have cross site scripting (XSS), denial of service (DoS), and phishing to name a few. Today I’m writing about one that I’m surprised hasn’t gotten more attention or, frankly, become more of a problem: so-called SEO attacks.

What’s an SEO attack? It’s a way of making a site appear less attractive and/or relevant to search engines. This actually isn’t at all hard to do. Anywhere you can post a public comment on a page, you can participate in an SEO attack.

At its worst, an SEO attack could be blind comment spamming, which would be easy for a site owner to detect and prevent. At its best, however, an SEO attack could be very difficult to trace indeed…

Attack #1: Profanity

Google’s SafeSearch definitely filters out results containing profanity—I’ve experienced this with my own blog. (I’m not sure how other search engines handle this.) This makes an SEO attack extremely simple. Just post a public comment that is seemingly on-topic yet has a profanity slipped in there. If the comment gets posted, you just killed the page’s chances of showing up in Google search results.

Attack #2: Link Spamming

We know that all web crawlers use hyperlinks to move from page to page and from site to site. The best web search algorithms also use hyperlinks to help determine a page’s authority on/importance to a given topic. You could pretty easily launch an SEO attack given this information. For example, you could post a comment with a link to a porn site, or to a known link farm.

Potential Harm

Could this really hurt someone’s business? Sure it could. A couple of sites (stackoverflow and amazon) jump to the top of my mind very quickly. Both of these derive significant benefit from Google searches (stackoverflow especially, since it is still a baby). Any real cuts into search engine relevance would be immediately felt.

Solutions?

These approaches may seem absurdly simple, but in fact they are not easy to stop. Moderation works but doesn’t scale (stackoverflow or amazon could never handle moderation, for example). Automated filtering works for profanity, but not for link spamming.

Conclusions

Uh, not sure I have any. SEO attacks seem like a potential problem, and I don’t see an easy solution. At the very least, I’ve concluded that this is a topic that warrants further investigation.

Here’s wishing you an SEO-safe site!

(Admittedly, SEO attacks are a particular type of XSS attack in that they involve injecting something harmful into someone else’s website. XSS attacks, however, more generally use some discreet technical vulnerability to do their work. SEO attacks are all about content which is publicly posted and visible by anyone, and they require zero technical know how to launch.)

URL Canonicalization

Over at unbiasd, we have URL canonicalization problem. You see, unbiasd brings together a bunch of news feeds from various websites and publishes selected articles from these feeds on the homepage. For each recently published article, we use Ice Rocket to search for back links to the article. This is how we power our “blog reactions” feature.

This works great—except for one problem. The article URLs given in feeds are not always the same URLs that bloggers would be linking to. FeedBurner is an obvious example of this. While some bloggers might mistakenly link to a FeedBurner URL, they probably meant to link to the actual article’s permalink instead.

The solution (in my mind) was simple: do a HEAD request on each URL before doing a search for blog reactions. If the HEAD responds with a Location header, then use the referenced URL rather than the original.

This works great for FeedBurner URLs, but there is a problem that I just discovered yesterday. Some sites (including The New York Times) will sometimes redirect you to a login/registration URL before allowing you to view an article. This obviously throws a little hitch into my giddy up: my HEAD request responds with a Location header pointing to a login page, so I go ahead and do a blog search using the login URL. Sweet.

The more that I deal with real-world web content, the more I realize that it is the wild west of data. Pretty much nothing is normalized. Wild hacks are the norm, because sometimes they are the only thing that work. If nothing else, at least this will force me to be seriously pragmatic in my day to day coding.