Search Accessibility: a how-to
Posted: June 4th, 2007 | Author: Laurie | Filed under: accessibility, search accessibility, seo, webdev | Comments OffIn my last post, I covered the politics of search accessibility, and why making your site available to all users is above all the profitable thing to do, without considering whether it’s the right thing. So now I’m going to cover how to make your site search accessible.
Please Feed the Spider
The program that runs around the Internet reading every single page and throwing it into Google’s* giant database is GoogleBot (Yahoo!’s is called Slurp). GoogleBot is your best friend, your worst enemy, your teddy bear and your mommy all rolled into one. GoogleBot is a very, very clever piece of software, but it’s not magical. Here is what GoogleBot does:
- It reads the text on your page and looks for “important” words and phrases
- It reads the links on your page and sees what pages you’re linking to
- It reads the links on the rest of the Internet and looks for pages that link to you
- It then calculates how relevant your page is to the words in it, based on the words on the pages that link to it, and how relevant they are based on other sites, and so on
Key take-home: it’s all about keywords and links. It is all about text. Attractive design and a witty site slogan and pictures of bikini models holding your product count for naught. As I mentioned in my last post, Google is in effect a disabled user using only the most basic of assistive technologies:
- It cannot see your images
- It will not execute JavaScript. Not any. (Real disabled users can often do JavaScript using better software these days)
- It’s not reading every bit of text on your page. It’s looking for the important words. And it’s in an almighty hurry.
- It does not follow links that do not look like web pages.
- It does not magically work out what your site is about. You need to make it obvious.
Already, some of the key things you need to do for SEO are obvious, in order of importance:
- Link text is important. Every time I see a link saying “click here” in 2007 it makes me want to weep. Link text is, above anything else, how Google decides what the page you’re linking to is about, and by working out what you’re linking to is how it works out what your page is about. Outgoing links aren’t silly, they’re essential.
- All your information must be accessible with JavaScript disabled, because that’s how Google sees the page.
- All your links must go to real information. “#” links are ignored, as are “javascript:..”. And the information on those pages must be relevant to the link text, obviously: don’t just link back to your home page.
- Any images must be described as text somewhere on the page, either within the ALT attribute, or some other technique**. Google ignores images (even image search is based on the text nearby).
What’s an Important Word?
It’s important to know what Google considers an “important” word. Google is more than a little secretive about this, but Google has its own guidelines for site design and professional, non-evil SEO people have their own search accessibility guidelines. My own, very subjective impression from several years of experience, is that the most important words on your page are:
- The link text going to your page. Nothing you can do about this but be a very good website, and hope people link to you. You can do your bit by linking to other people with sensible keywords, of course, and hoping they link back — but trading links explicitly is something GoogleBot is designed to detect. And it’s been spotting fakers a lot longer than you’ve been faking, so I don’t recommend trying to fake it.
- The page title. Don’t repeat your site name and slogan endlessly: say what this page in particular is about. Put keywords in there! It’s also what users see on the search results page, too, so make sure it makes sense to human beings.
- The meta description tag. This is an odd one: Google doesn’t pay too much attention to it in calculating relevance, but at a certain level of relevance, this is what it puts as the text under your page title in the search results, where it suddenly becomes very important to users who are about to click your link. So it’s important that this text is descriptive, useful, and short — something under 100 words. And again, load up on keywords. Repeat yourself, phrasing the same thing several different ways.
- H1 and H2 tags. H3 is dicey and everything beyond that is meaningless, but H1s in particular are super important, but only because they are rare. If everything on your page is a goddamn H1 obviously Google is going to ignore you. Use 1 or 2 H1s, and less than 10 H2s.
- ALT attributes on images. This is way down the list, so if you have really important text in your images, it’s best to use the technique I outlined in the third footnote so that it turns up in an H1 or H2.
Order is important, or, Don’t use Tables
Another aspect of your page that is extremely important to Google is source code order: literally, the order things appear in your source. Things that appear early on are likely to be more important than things that appear later. That seems obvious, right? But now look at your code: you’ve got the head, full of juicy meta data, and then you’ve got 5k of navigational elements, sidebar text, various other cruft, just placed first because you were using a left-floated column and so it was easier to put it there. This is killing you.
What’s much worse is when your source code order physically separates content that is semantically related: for instance, your headline is at the top of your page, then you have 5k of navigational cruft, then you have your content. Google will either fail to realise that your headline is describing your content, and thus not link the words, or worse, it will decide that your page doesn’t actually have any content on it relating to your headline, and you’re trying to spam it. Danger, Will Robinson!
And of course the number one offender from this perspective is using tables for layouts. If you care about web development, you’re probably aware that tables have serious issues with flexible, attractive layouts. However, that’s usually not a good enough reason to take to your boss: after all, it doesn’t bother her that your job is hard. However, tell her that using tables is causing an 80% drop in traffic to your site (as I explained in the last post) and suddenly you have an easy, obvious business case for reworking the layout of your code.
Tables put data into grid layout. If your data is in columns — and it frequently is, this means you often end up with a site code layout that looks like this:
Site logo Article headline
- List
- of
- nav
- links
Article body
To Google this reads like:
- Site logo
- Article headline
- List
- Of
- Nav
- Links
- Article body
So you can see why Google might get confused. So examine your code, and put things in the order of importance: you can use CSS to move stuff around on the page later. Coincidentally, source code order is also the order in which screen readers will read out your page to a blind user. So once again there’s a useful coincidence of making your site accessibile when you make it search accessible.
Of course — and I would have thought this was obvious, but I get questions about it that indicate to the contrary — you can use tables when the data is tabular. Don’t try to mark up your spreadsheet data using a series of stacked lists. Tables have real semantic meaning, but it has been diluted almost beyond help by their consistent misuse.
There is more I could tell you about SEO — the various hazily-defined statistical rules about how many links on a page is too many, optimal keyword density, and more, but these advanced techniques are icing on the cake, and the cake is made of search accessibility. It doesn’t matter what your keyword density is if Google can’t even get to your pages. So get out there and make the case for accessibility. And when the traffic is rolling in and your boss is giving you your huge bonus, you can get a tiny little extra bit of joy from knowing your site is also accessible to disabled users.
* When I say Google, obviously I mean Yahoo!, Ask and all the other major search engines as well. They all work the same way. If Google didn’t want me to use their name to mean all search engines, they shouldn’t have made it a verb.
** For important text like headlines, it’s often better to put the text into the page directly in a semantically-meaningful element (like H1, H2, etc), make the text transparent, and then put the nicely-styled image in as a background image. This makes no difference to what your users see but it makes the words look a lot more “important” to Google.