taylorbarstow.com

SEO Attacks

People have discovered many ways to disable, harm, or otherwise compromise websites and web services over the years. We have cross site scripting (XSS), denial of service (DoS), and phishing to name a few. Today I’m writing about one that I’m surprised hasn’t gotten more attention or, frankly, become more of a problem: so-called SEO attacks.

What’s an SEO attack? It’s a way of making a site appear less attractive and/or relevant to search engines. This actually isn’t at all hard to do. Anywhere you can post a public comment on a page, you can participate in an SEO attack.

At its worst, an SEO attack could be blind comment spamming, which would be easy for a site owner to detect and prevent. At its best, however, an SEO attack could be very difficult to trace indeed…

Attack #1: Profanity

Google’s SafeSearch definitely filters out results containing profanity—I’ve experienced this with my own blog. (I’m not sure how other search engines handle this.) This makes an SEO attack extremely simple. Just post a public comment that is seemingly on-topic yet has a profanity slipped in there. If the comment gets posted, you just killed the page’s chances of showing up in Google search results.

Attack #2: Link Spamming

We know that all web crawlers use hyperlinks to move from page to page and from site to site. The best web search algorithms also use hyperlinks to help determine a page’s authority on/importance to a given topic. You could pretty easily launch an SEO attack given this information. For example, you could post a comment with a link to a porn site, or to a known link farm.

Potential Harm

Could this really hurt someone’s business? Sure it could. A couple of sites (stackoverflow and amazon) jump to the top of my mind very quickly. Both of these derive significant benefit from Google searches (stackoverflow especially, since it is still a baby). Any real cuts into search engine relevance would be immediately felt.

Solutions?

These approaches may seem absurdly simple, but in fact they are not easy to stop. Moderation works but doesn’t scale (stackoverflow or amazon could never handle moderation, for example). Automated filtering works for profanity, but not for link spamming.

Conclusions

Uh, not sure I have any. SEO attacks seem like a potential problem, and I don’t see an easy solution. At the very least, I’ve concluded that this is a topic that warrants further investigation.

Here’s wishing you an SEO-safe site!

(Admittedly, SEO attacks are a particular type of XSS attack in that they involve injecting something harmful into someone else’s website. XSS attacks, however, more generally use some discreet technical vulnerability to do their work. SEO attacks are all about content which is publicly posted and visible by anyone, and they require zero technical know how to launch.)

Test Awareness Month

It’s Test Awareness Month folks, and I guess I better do my part. For one thing, let me echo Bryan Liles and say that you should Test All The Fucking Time. I’m not going to write a big monologue on testing (I love it, and you should too), but here’s a super useful link if you are wondering how to go about testing Rails apps (it’s not always obvious; how should one test active record relationships, for example?).

Happy testing!

The Adobe AIR Socket Implementation Sucks

AIR sockets write data asynchronously—that is, the data you send to the socket goes into a buffer and the socket implementation decides when to actually write it out. Problem is, there is no way of knowing how much data has been written vs. how much is still in the buffer—no event to subscribe to, no bytesWritten property. There is a flush method on the socket, but it returns immediately without blocking so you can’t use this to force a synchronous write. (Since AIR doesn’t support multiple threads yet, writing synchronously isn’t really an option anyway.)

WTF Adobe? This is a glaring omission. Adrian raises an interesting question: did they actually do this on purpose? How hard could this possibly be?! I mean, they are simply exposing an underlying socket implementation provided by the OS, right? And every socket implementation on the block can tell you how much data is in the stupid buffer. grrrr.

Someone posted this as a bug on AIR’s issue tracker back in April. They didn’t get a response until August. Want to know what it said? “This is an enhancement request, i’ve logged it internally.” Sweet guys, thanks for the update.

Alas, it’s back to platform shopping for me. I hope Adobe can figure this out. Not only is it annoying, it makes them look downright stupid. Oh well.