On Apple, the iPhone, and web apps

Posted: April 12th, 2010 | Author: Laurie | Filed under: apple, iphone, webdev | No Comments »

I’ve written about Apple’s ban on intermediate platforms, and what this means for web apps over on my main blog.


The Serial VCS Fuckup’s Guide To Git

Posted: March 31st, 2010 | Author: Laurie | Filed under: git, vcs, webdev | No Comments »

I suck at VCS. There, I’ve said it. I love the concept, I can see why it’s useful, I require no persuasion that I should be doing all my feature development in hundreds of little branches and pushing them back and forth at will. However, when I attempt to put those plans into action, I routinely fuck everything up and end up deleting long lines of “>>>>>” and accidentally breaking shit. It’s embarrassing.

So I’m determined not to do this with git. I will master how branching and merging works in git. And, because I drink too much caffeine and can therefore never remember anything, I will write it down in terms the developmentally-challenged could understand, because that’s the kind of documentation that works for me. And maybe it will work for you, too.

I’m writing this down as I go. When I fuck stuff up, I will leave it in, on the basis that finding out why stuff doesn’t work is at least as instructive as a perfect happy-path demonstration of everything working. This is a work in progress; I will add new things when I learn how to do them.

Creating a repository

I’m going to assume you created a repo on github, using the GUI. Because that’s what I did. I’ve called it “git-training”. It won’t matter for a while though. For now, on your local machine, run:


> mkdir git-training
> cd git-training
> git init

You’ve created a directory and told git to treat it as the root of a repository. That’s all. Nothing’s in it yet, and it’s not connected to anything. The beauty of git is that you can start committing before needing to do any of that — in fact, if you felt like it, you could go on forever without pushing to github. That’s the “distributed” part of git. Git treats the local repo as authoritative and the server as optional. You can see what’s going on by typing


> git status

# On branch master
#
# Initial commit
#
nothing to commit (create/copy files and use "git add" to track)

As you can see, nothing’s happened yet, and it’s suggesting your next step. It also says you’re on branch “master”, which is the default for the initial branch and is not magical in any way. It’s called “master”, but it could be called “walrus”. Don’t get hung up about that. So let’s do what it suggested, and create and add a file:


> nano README

(Or vim or whatever. I use nano because it’s the editor that doesn’t require me to know random key combinations in advance. Stop judging me.)

Put something in the file. Github likes READMEs.


> git add README

Git now knows about this file, and will track what’s going on with it.


> git status

# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached ..." to unstage)
#
# new file: README
#

“Initial commit” means it knows you’ve never committed anything before. It’s listing the files you could commit. There’s one, and it knows about it already because you added it. Let’s see what happens if you create a second file, without adding it:


> nano secondfile.txt
>> and, you know, put something in it.
> git status

# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached ..." to unstage)
#
# new file: README
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# secondfile.txt

Now there’s a difference between these files: it can see both, but only one is set up to be committed. Let’s see what happens if you try to commit:


> git commit -m "first commit"

Created initial commit ef437ec: first commit
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 README

Note that you’ve not said what file to commit, so it tries to commit everything tracked — so README, but not secondfile.txt. It doesn’t care that you’ve not told it about secondfile.txt. It will continue to ignore it forever.

The random string is a unique ID for the commit. The file-changes stuff is self-explanatory, and I dunno what the rest means, frankly. Let’s find out where this has left us:


> git status

# On branch master
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# secondfile.txt
nothing added to commit but untracked files present (use "git add" to track)

There’s secondfile.txt, gleefully ignored. status won’t tell us about README, because it’s already committed as far as local git is concerned: local git considers itself just as authoritative as the upstream server — which, at this point, it hasn’t even heard about.

So let’s take our local master and push it to github.


> git remote add origin git@github.com:seldo/git-training.git

This is quick, because all you’re doing is telling local git that the remote git repository exists (git@github.com is the standard username shared by everybody, seldo is my personal namespace within github, and git-training is my specific repository). You’ve also given it a name: “origin”. Again, nothing magical about this name, it’s just a convention. Call it “hippo” if that floats your boat.

Now we want to push the local repository to the remote one. To do this, you’re going to need github to know about your local SSH key. Github has really great instructions on how to do this, and will automatically send you to the ones for your own operating system. So go there and do that, and then come back.


> git push origin master

When I first started this confused the hell out of me, but since I’ve given you some background it should be obvious what you’re doing. The first argument to push is the destination (despite the name) and the second is the source. You’re saying “push from branch ‘master’ to remote repo ‘origin’”. The output looks like this:


Counting objects: 3, done.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 268 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
* [new branch] master -> master

Again, I’d be lying if I said I knew what all of this crap means, but there you go: it’s on github now, in a branch called “master” (remember, “origin” is just your local label for the remote repository, not a branch name).

Woo! You’ve got stuff into the repository. Go you. But don’t stop here. There’s more to learn. If you’re ready to start coding properly, then go to “creating a feature branch”. If you’re trying to hack on something that already exists, then you want to read this next section.

Cloning a repository

Say somebody else did all the above for you already. You need to work on a copy of the code, like RIGHT NOW. You don’t want to look like an idiot. Get started! On a fresh machine (or in a subdirectory somewhere away from the one where your current git-training directory is), just run:


> git clone git@github.com:seldo/git-training.git

remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2)remote: , done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.

Woo! Now a directory called git-training, and inside of it is the README file you pushed up earlier.

In my own testing, I put this inside a new directory called “son-of-git”, so that I could have both on the same machine without conflicting.

Hooray! Now you’re ready to edit code!

Creating a feature branch

So you’ve either created or cloned a repository, and now you want to edit code. Top tip: don’t do it in the master branch. You want your master branch to be pure and full of final, production-ready code at all times. At the same time, you don’t want to leave more than a few hours between committing changes you’re writing to your feature. So create a branch, and commit to it regularly, and make sure everybody else does too. If you need to collaborate with somebody, don’t share a branch — instead, just pull from their branch into yours.

So first, let’s look at where we are:


> git branch

* master

Just one branch, master, and it’s got the * to tell you it’s selected. Let’s create and switch to a new branch:


> git branch feature1

This creates the branch. Dunno why they couldn’t have thrown the word “create” in there, especially since “git branch” by itself lists branches, but hey, that’s command-line software for you. Rationality is for wimps.


> git checkout feature1

This switches to the branch. Again, not a very intuitive command name.


> git branch

* feature1
master

Woot. Two branches listed, and we’re in feature1 (the star again). Let’s do some changes.


> mkdir feature1files
> cd feature1files
> nano file1.txt
>> insert "I am file 1, version 1"
> nano file2.txt
>> insert "I am file 2, version 1"

Where has this left us?


> cd ..
> git status

# On branch feature1
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# feature1files/

Okay! Git has spotted the new folder. We need to add the files. You can go about this in two ways. Either tell it about each file specifically (you don’t need to add the parent directory first):


> git add feature1files/file1.txt
> git status

# On branch feature1
# Changes to be committed:
# (use "git reset HEAD ..." to unstage)
#
# new file: feature1files/file1.txt
#
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# feature1files/file2.txt

Or you can add an entire directory at once, and git is smart enough to get everything in it:


> git add feature1files
> git status

# On branch feature1
# Changes to be committed:
# (use "git reset HEAD ..." to unstage)
#
# new file: feature1files/file1.txt
# new file: feature1files/file2.txt

Now you’re done for the day. Don’t hesitate: commit that shit!


> git commit -m "I worked on feature 1"

Created commit 595dc85: I worked on feature 1
2 files changed, 2 insertions(+), 0 deletions(-)
create mode 100644 feature1files/file1.txt
create mode 100644 feature1files/file2.txt

Again, this is super-quick, because it’s just talking to the local git repository, not the remote one. If you look on github right now, nothing has changed, because you haven’t pushed yet. Let’s try that now:


> git push origin master

Everything up-to-date

Wait, why did that happen? Look at what you asked it to do: “push from branch ‘master’ to remote repo ‘origin’”. And of course, on the branch “master”, nothing has changed — you’re in the branch called “feature1″. What you want to do is push from “feature1″ to the remote repo, so instead you need to do:


> git push origin feature1

Counting objects: 6, done.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 393 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
595dc85..33c25f2 feature1 -> feature1

So now go look at github. You’ll see instead of 1 branch, it now lists 2: master and feature1. Your feature is up in the cloud, safe and sound, and somebody else could check it out and use that feature in their own dev box. But you’ve not buggered up the master. Amazing!

The ability to push changes from any branch no matter what branch you’re in is one of the amazing parts of git, but also to a subversion or CVS user, one of the most confusing.

Even more amazing, you can now switch to another branch, instantly. Try it:


> git checkout master

Do an ls and suddenly, no directory called feature1files — but all is well!


> git checkout feature1

Aaaand it’s back. No muss, no fuss, and no talking to the server. Remember, the local git repository is the authority. It knows about all the branches you’ve dealt with and their current state. It only ever needs to talk to the server to get stuff other people have done, or to push your changes there. Again, this is a big mental adjustment you need to make.

To emphasize this, let’s do something that, if you’re used to centralized version control, seems a little wacky: switch between branches without telling the server about new files. First create it:


> cd feature1files
> nano file3.txt
>> and put something in it

Without adding it, let’s switch to the master branch and see what happens:


> git checkout master

Directory “feature1files” is still with us, but only 1 file is in it: file3.txt. That’s because git doesn’t know what to do with file3, but it’s certain that file1 and file2 aren’t in master. So switch back to the branch:


> git checkout feature1

Add the file:


> git add file3.txt

At this point, you can still switch back and forth between branches and file3.txt will come with you. But commit the file:


> git commit file3.txt -m "added a new file"

Created commit 33c25f2: added a new file
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 feature1files/file3.txt

Again, the file3.txt isn’t necessary — git would have committed whatever you’d added or changed. Now if you switch:


> git checkout master

No feature1files directory! Because now git knows where everything is, and it’s not in master. If you look at github, you won’t see file3 there either — it’s inside git, locally. You can get back there quickly:


> git checkout feature1

Woo, everything is back!

Merging a feature back into the master

Okay, enough messing about. You’re ready to merge your feature back into the master, because it’s ready for production. It’s pretty easy. First, get to the master branch:


> git checkout master

And merge in the feature:


> git merge feature1

Remember, you’re merging locally here. Nothing is happening to anything but your local copy, and git knows about all the versions of everything, so you can always undo this.

Now feature1files has turned up. As far as local git is concerned, you’re done. The branches are merged (unless you had some conflicts, which we’ll deal with later). You can switch to some new feature branch now if you want:


> git branch feature2
> git checkout feature2

This new branch already has feature1 in it. If you switch back to master, you’ll see a warning:


> git checkout master

Switched to branch "master"
Your branch is ahead of the tracked remote branch 'origin/master' by 3 commits

Git knows about the remote repo called “origin”, which has a branch called ‘master’. It’s letting you know you’ve not pushed there. Let’s do that now:


> git push origin master

Counting objects: 6, done.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 367 bytes, done.
Total 4 (delta 1), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
ef437ec..6ebc96e master -> master

Nice! A refresh over at the github site will now show you feature1 files in the master. But what if you wanted to do the opposite, and pull changes from the master to be merged into your local copy? That’s a section after this one, but first…

Pulling changes down into your working copy

Remember back at the beginning, when we created the repo in the first place, either on another machine, or just some other directory? You’ve been happily building new features over in your cloned copy (“son of git”), and ignoring the original (let’s call it “git the first” to avoid future confusion). Time to catch up! Let’s try the dumb way first:


> git pull

remote: Counting objects: 14, done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 13 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (13/13), done.
From git@github.com:seldo/git-training
* [new branch] feature1 -> origin/feature1
ef437ec..6ebc96e master -> origin/master
You asked me to pull without telling me which branch you
want to merge with, and 'branch.master.merge' in
your configuration file does not tell me either. Please
name which branch you want to merge on the command line and
try again (e.g. 'git pull ').

Blah, blah, blah. All hell has broken loose. The reason is evident from the error: there’s more than one remote branch, and because there’s no magic involved in branch names, “master” is no more special than “feature1″. Git needs you to say which branch you want to pull, and where you want it pulled into. So let’s tell it:


> git pull origin master

From git@github.com:seldo/git-training
* branch master -> FETCH_HEAD
Updating ef437ec..6ebc96e
Fast forward
feature1files/file1.txt | 1 +
feature1files/file2.txt | 1 +
feature1files/file3.txt | 1 +
3 files changed, 3 insertions(+), 0 deletions(-)
create mode 100644 feature1files/file1.txt
create mode 100644 feature1files/file2.txt
create mode 100644 feature1files/file3.txt

This translates to “pull the branch ‘master’ from remote repo ‘origin’ into the local branch called ‘master’”, which it does with a minimum of fuss. Neat!

Now we can make a bugfix to our master copy, and even get around to adding that “secondfile.txt” we created way back in the day.


> nano feature1files/file1.txt
>> change "version 1" to "version 2"
> git add secondfile.txt
> git status

# On branch master
# Changes to be committed:
# (use "git reset HEAD ..." to unstage)
#
# new file: secondfile.txt
#
# Changed but not updated:
# (use "git add ..." to update what will be committed)
#
# modified: feature1files/file1.txt
#

Okay, that’s what we wanted to change, so let’s commit it:


> git commit -m "bugfix on master"

1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 secondfile.txt

Hmmm, only one file? Let’s see what happened:


> git status

# On branch master
# Changed but not updated:
# (use "git add ..." to update what will be committed)
#
# modified: feature1files/file1.txt
#

Hmmm, that’s not what we were expecting… file1.txt is already part of the repository, why didn’t our commit get it? Because git only commits what you’ve explicitly told it to commit, with “add”. Earlier, we added everything, so commit got everything. So let’s add file1 to this commit, and commit again:


> git add feature1files
> git commit -m "bugfix, round 2"

Created commit b645bd3: bugfix, round 2
1 files changed, 1 insertions(+), 1 deletions(-)

> git status

# On branch master
nothing to commit (working directory clean)

Woo! Now it’s all working. Just to make sure we understand how that works, let’s do it again. Change one file, and create another:


> nano secondfile.txt
>> append "version 2"
> nano thirdfile.txt
> git add secondfile.txt
> git commit -m "second file only"

Created commit a62725d: second file only
1 files changed, 1 insertions(+), 1 deletions(-)

> git status

# On branch master
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# thirdfile.txt

The change to secondfile went in; the change to thirdfile did not. Let’s push our change to secondfile up to the server, so we can use it in our next lesson:


> git push origin master

Counting objects: 13, done.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (10/10), 1.01 KiB, done.
Total 10 (delta 1), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
6ebc96e..a62725d master -> master

And a quick peek at github’s master branch shows secondfile.txt has turned up.

Merging the master into your branch

Okay, so we’re back in Son Of Git, original home of feature1. Let’s see where we left off:


> git branch

feature1
feature2
* master

We’re in master, but that juicy feature2 over there is looking like fun. Let’s get over there:


> git checkout feature2

All is well, except we don’t have the latest bugfix we committed to the master — secondfile.txt is essential to our work! But feature2 isn’t ready yet. So instead, let’s merge the master into this feature branch, so we can use the bugfix:


> git merge master

Already up-to-date.

D’oh! Of course, that doesn’t work, because, remember, git always believes it is the authority. Master on this machine hasn’t changed since we created feature2, so it thinks everything is up to date. The solution is to switch back to master, and get it in sync with the server:


> git checkout master
> git pull origin master

remote: Counting objects: 13, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 10 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (10/10), done.
From git@github.com:seldo/git-training
* branch master -> FETCH_HEAD
Updating 6ebc96e..a62725d
Fast forward
feature1files/file1.txt | 2 +-
secondfile.txt | 1 +
2 files changed, 2 insertions(+), 1 deletions(-)
create mode 100644 secondfile.txt

Again, this means “pull the latest changes in the branch called master from the remote repo called origin to our local machine”. Nothing magic about the names. With that done, let’s switch back to feature2 and try that merge again:


> git checkout feature2
> git merge master

Updating 6ebc96e..a62725d
Fast forward
feature1files/file1.txt | 2 +-
secondfile.txt | 1 +
2 files changed, 2 insertions(+), 1 deletions(-)
create mode 100644 secondfile.txt

Wahey! Feature2 is now in sync with the server and all is well. Amazing! Now let’s go for something genuinely tricky:

Handling conflicts between your feature and the master

Augh, the dread! The horror! The merge conflicts! The best solution for conflicts is to never have them, or if you must have them, keep them small: everybody should work on isolated features, commit often, merge often, etc.. But it’s too late for that now. You’ve got a conflict. Time to fix it. First, let’s create the merge conflict. Over in “git the first”, edit secondfile:


> nano secondfile.txt
>> Prepend the line "I am the master title"
> git add secondfile.txt
> git commit -m "adding a title"

Created commit 97e64c8: adding a title
1 files changed, 2 insertions(+), 0 deletions(-)

> git push origin master

Counting objects: 5, done.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 309 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
a62725d..97e64c8 master -> master

So far so good. Now, over in “son of git”, do something suspiciously similar in your feature branch:


> git checkout feature2
> nano secondfile.txt
>> Prepend the line "I am the feature2 title"
> git add secondfile.txt
> git commit -m "feature2 has a title"

Now, go back to master and pull in the change from Git The First, so our local repository has a copy of that change.


> git checkout master
> git pull origin master

remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From git@github.com:seldo/git-training
* branch master -> FETCH_HEAD
Updating a62725d..97e64c8
Fast forward
secondfile.txt | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

Remember, there’s no conflict here yet — your local master never changed, so all you did was pull in changes from the remote master. But now go over to your feature2 branch again:


> git checkout feature2
> git merge master

Auto-merged secondfile.txt
CONFLICT (content): Merge conflict in secondfile.txt
Automatic merge failed; fix conflicts and then commit the result.

Oh noes! You have the dreaded long lines of >>>>>s in secondfile.txt. No way around it: you edited the same line of the same file. Somebody has to win. Let’s make it be our feature. Edit the file:


> nano secondfile.txt

<<<<<<< HEAD:secondfile.txt
I am the feature2 title.
=======
I am the master title
>>>>>>> master:secondfile.txt

This is a second file, version 2

Here, because of the specific text we entered, it’s pretty clear what’s going on. The lower is what got pulled in from the master branch, the upper is what was in “HEAD”, i.e. the local branch. Delete the lines as appropriate, save, and then:


> git status

secondfile.txt: needs merge
# On branch feature2
# Changed but not updated:
# (use "git add ..." to update what will be committed)
#
# unmerged: secondfile.txt
#

To tell git you’ve merged things, you need to treat it like a fresh commit. So add, then commit:


> git add secondfile.txt
> git commit -m "merge fix"
> git status

# On branch feature2
nothing to commit (working directory clean)

Nice. We can push this to the server (still on our feature branch) so other people can use it:


> git push origin feature2

Counting objects: 8, done.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 454 bytes, done.
Total 4 (delta 2), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
* [new branch] feature2 -> feature2

And back over on the master branch, we can pull in our new merged copy without fear, because we already handled the conflicts. So the new merge goes off without a hitch:


> git checkout master

Switched to branch "master"
Your branch is ahead of the tracked remote branch 'origin/master' by 4 commits.

> git merge feature2

Updating 97e64c8..a63623b
Fast forward
secondfile.txt | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

Now looking at secondfile.txt, you see it’s got the feature2 title. You’re merged, you can test and push back to the server:


> git push origin master

Total 0 (delta 0), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
97e64c8..a63623b master -> master

If you look on github, you can now see secondfile.txt in the master branch, and it’s got the feature2 title in it. You’ll also notice the history doesn’t include the messy “merge fix” thing — that’s because, to the master branch the conflict never happened: the commit that fixed the conflict happened in the feature2 branch. The only thing that happened in the master branch is a trouble-free merge. Whee!

What about other types of fuck-ups? There are so many…

Undo an add

You added that file you didn’t mean to. Do’h! You can reset the list of added files like so:


> git reset

This will undo all your adds, so you’ll need to re-do the ones you meant to add.

Reverting un-committed files

You’ve edited a file, you don’t like it, you want it back the way it was. There’s two ways. For just the one file, try:


> git checkout HEAD fuckedfile.rb

Or, if you’ve messed up everything in the entire working tree, here’s the nuclear option:


> git reset --hard HEAD

This will throw away EVERYTHING you’ve changed but not committed, so beware.

Other stuff

This is really all you need to know to handle basic life in git. If you’ve got all this to work, you should understand what you’re doing. But I’ll add new stuff that’s more complicated when I find it.


Search Accessibility: a how-to

Posted: June 4th, 2007 | Author: Laurie | Filed under: accessibility, search accessibility, seo, webdev | Comments Off

In my last post, I covered the politics of search accessibility, and why making your site available to all users is above all the profitable thing to do, without considering whether it’s the right thing. So now I’m going to cover how to make your site search accessible.

Please Feed the Spider

The program that runs around the Internet reading every single page and throwing it into Google’s* giant database is GoogleBot (Yahoo!’s is called Slurp). GoogleBot is your best friend, your worst enemy, your teddy bear and your mommy all rolled into one. GoogleBot is a very, very clever piece of software, but it’s not magical. Here is what GoogleBot does:

  1. It reads the text on your page and looks for “important” words and phrases
  2. It reads the links on your page and sees what pages you’re linking to
  3. It reads the links on the rest of the Internet and looks for pages that link to you
  4. It then calculates how relevant your page is to the words in it, based on the words on the pages that link to it, and how relevant they are based on other sites, and so on

Key take-home: it’s all about keywords and links. It is all about text. Attractive design and a witty site slogan and pictures of bikini models holding your product count for naught. As I mentioned in my last post, Google is in effect a disabled user using only the most basic of assistive technologies:

  1. It cannot see your images
  2. It will not execute JavaScript. Not any. (Real disabled users can often do JavaScript using better software these days)
  3. It’s not reading every bit of text on your page. It’s looking for the important words. And it’s in an almighty hurry.
  4. It does not follow links that do not look like web pages.
  5. It does not magically work out what your site is about. You need to make it obvious.

Already, some of the key things you need to do for SEO are obvious, in order of importance:

  • Link text is important. Every time I see a link saying “click here” in 2007 it makes me want to weep. Link text is, above anything else, how Google decides what the page you’re linking to is about, and by working out what you’re linking to is how it works out what your page is about. Outgoing links aren’t silly, they’re essential.
  • All your information must be accessible with JavaScript disabled, because that’s how Google sees the page.
  • All your links must go to real information. “#” links are ignored, as are “javascript:..”. And the information on those pages must be relevant to the link text, obviously: don’t just link back to your home page.
  • Any images must be described as text somewhere on the page, either within the ALT attribute, or some other technique**. Google ignores images (even image search is based on the text nearby).

What’s an Important Word?

It’s important to know what Google considers an “important” word. Google is more than a little secretive about this, but Google has its own guidelines for site design and professional, non-evil SEO people have their own search accessibility guidelines. My own, very subjective impression from several years of experience, is that the most important words on your page are:

  1. The link text going to your page. Nothing you can do about this but be a very good website, and hope people link to you. You can do your bit by linking to other people with sensible keywords, of course, and hoping they link back — but trading links explicitly is something GoogleBot is designed to detect. And it’s been spotting fakers a lot longer than you’ve been faking, so I don’t recommend trying to fake it.
  2. The page title. Don’t repeat your site name and slogan endlessly: say what this page in particular is about. Put keywords in there! It’s also what users see on the search results page, too, so make sure it makes sense to human beings.
  3. The meta description tag. This is an odd one: Google doesn’t pay too much attention to it in calculating relevance, but at a certain level of relevance, this is what it puts as the text under your page title in the search results, where it suddenly becomes very important to users who are about to click your link. So it’s important that this text is descriptive, useful, and short — something under 100 words. And again, load up on keywords. Repeat yourself, phrasing the same thing several different ways.
  4. H1 and H2 tags. H3 is dicey and everything beyond that is meaningless, but H1s in particular are super important, but only because they are rare. If everything on your page is a goddamn H1 obviously Google is going to ignore you. Use 1 or 2 H1s, and less than 10 H2s.
  5. ALT attributes on images. This is way down the list, so if you have really important text in your images, it’s best to use the technique I outlined in the third footnote so that it turns up in an H1 or H2.

Order is important, or, Don’t use Tables

Another aspect of your page that is extremely important to Google is source code order: literally, the order things appear in your source. Things that appear early on are likely to be more important than things that appear later. That seems obvious, right? But now look at your code: you’ve got the head, full of juicy meta data, and then you’ve got 5k of navigational elements, sidebar text, various other cruft, just placed first because you were using a left-floated column and so it was easier to put it there. This is killing you.

What’s much worse is when your source code order physically separates content that is semantically related: for instance, your headline is at the top of your page, then you have 5k of navigational cruft, then you have your content. Google will either fail to realise that your headline is describing your content, and thus not link the words, or worse, it will decide that your page doesn’t actually have any content on it relating to your headline, and you’re trying to spam it. Danger, Will Robinson!

And of course the number one offender from this perspective is using tables for layouts. If you care about web development, you’re probably aware that tables have serious issues with flexible, attractive layouts. However, that’s usually not a good enough reason to take to your boss: after all, it doesn’t bother her that your job is hard. However, tell her that using tables is causing an 80% drop in traffic to your site (as I explained in the last post) and suddenly you have an easy, obvious business case for reworking the layout of your code.

Tables put data into grid layout. If your data is in columns — and it frequently is, this means you often end up with a site code layout that looks like this:

Site logo Article headline
  • List
  • of
  • nav
  • links
Article body

To Google this reads like:

  • Site logo
  • Article headline
  • List
  • Of
  • Nav
  • Links
  • Article body

So you can see why Google might get confused. So examine your code, and put things in the order of importance: you can use CSS to move stuff around on the page later. Coincidentally, source code order is also the order in which screen readers will read out your page to a blind user. So once again there’s a useful coincidence of making your site accessibile when you make it search accessible.

Of course — and I would have thought this was obvious, but I get questions about it that indicate to the contrary — you can use tables when the data is tabular. Don’t try to mark up your spreadsheet data using a series of stacked lists. Tables have real semantic meaning, but it has been diluted almost beyond help by their consistent misuse.

There is more I could tell you about SEO — the various hazily-defined statistical rules about how many links on a page is too many, optimal keyword density, and more, but these advanced techniques are icing on the cake, and the cake is made of search accessibility. It doesn’t matter what your keyword density is if Google can’t even get to your pages. So get out there and make the case for accessibility. And when the traffic is rolling in and your boss is giving you your huge bonus, you can get a tiny little extra bit of joy from knowing your site is also accessible to disabled users.

* When I say Google, obviously I mean Yahoo!, Ask and all the other major search engines as well. They all work the same way. If Google didn’t want me to use their name to mean all search engines, they shouldn’t have made it a verb.

** For important text like headlines, it’s often better to put the text into the page directly in a semantically-meaningful element (like H1, H2, etc), make the text transparent, and then put the nicely-styled image in as a background image. This makes no difference to what your users see but it makes the words look a lot more “important” to Google.


Accessiblity and SEO: or, why accessible websites are not for the disabled

Posted: May 31st, 2007 | Author: Laurie | Filed under: accessibility, search accessibility, seo, webdev | 2 Comments »

So I was at the @media America conference last week. There was much talk of accessibility and how to do it properly, when to do it, and even when not to do it. There was also talk about why to do it, but that’s where I think the speakers dropped the ball. Accessibility is not about helping disabled people: it’s about money, and you making more of it. (I’m going to use a lot of bold text in this post to emphasize stuff. That’s because it’s long, and you’re skim-reading. See, I know you.)

Accessibility: because it’s the Right Thing™?

The dirty secret of accessibility, swept under the rug by many an evangelist, is that the cost of making your site accessible is relatively high: in my experience, something like 20% additional dev time on a new project, although experienced developers can bring this down, and the cost decreases dramatically for incremental updates once the project is up and running. But a 20% margin is definitely non-trivial. And if you’ve not been thinking in terms of accessibility from the start, this pricetag rises sharply: retrofitting accessibility often involves fundamentally reworking the architecture of a web page*. You’ll be looking at spending something like 50% of the time you spent originally developing the site on the retrofit. Ouch.

The other dirty secret of accessibility is that the number of disabled users is relatively low. Not tiny, but I often hear figures like “60% of Americans are disabled”, and while this is true, it’s disingenuous because that figure can include people like amputees or paraplegics who can use the web with no problems whatsoever. The truth is that somewhere between 10% and a maximum of 20% of your users will have trouble using your site without assistive technologies. This makes it a very close call, when starting a new project — serving 80% of your possible users doesn’t seem ideal, but is an acceptable loss to get it out of the door 20% faster, right? You can build the accessibility in later!

Except you can’t. After launch, you’ve got an inaccessible site and you’re facing a 50% dev time bill to retrofit that acessibility in: another 3 weeks on what was a 6-week project, just to get 20% more users? That makes no business sense: much better to build another project, and get another 80% of users in the door quickly.
This is the kind of unavoidable math that has made the web inaccessible today. And that’s the harsh truth: building in accessibility for disabled users does not make business sense. It’s still a good idea, a noble idea, but it’s not a financially sound one. This is true in the real world, too, which is why legislation was necessary to force everybody to put accessible toilets and wheelchairs in everywhere.

Accessibility: because you could get sued?

Of course, legislators have (eventually) worked out this problem, and as such there is already web accessibility legislation in place in many countries that makes it illegal to produce an inaccessible website. Problem solved! It’s the law! We have to do it! Right?

In an ideal world, yes. In the real world, the law is only patchily enforced. Only a few very large, very high-profile sites have been sued so far (plus some government sites). You can always fly under the radar, hope nobody notices, and not build in accessibility until they sue you. It’s a good gamble to make to avoid increasing the cost of your site by 50%, right? Again, the math defeats us.

But this is all very unsatisfying. You, the clever, compassionate, standards-compliant modern web developer, feel that this cold logic is intrinsically, morally wrong. So you make the case for accessibility: you try to inflate disabled user numbers (counterproductive; it will make your manager trust you less) and deflate the amount of time it will take to make it accessible (an even worse idea; now you’re missing deadlines because of “that damn accessibility stuff”, making your manager hate the whole idea).

So here’s how you, as a developer, can stay true to your noble impulses to build an elegant, accessible website: stop calling it accessibility.

SEO: Open up, Google is coming!

Search Engine Optimization, or SEO, is the hot shit right now. Google is the Internet for a lot of people, and if Google can’t find it, then it doesn’t exist. Nobody goes deep-diving on a site to try and dig up information anymore. Either they type in their search terms and your site comes up with exactly what they need on that page, or they will never click your link. Sites these days get 50-90% of their traffic from search engines**, and the overwhelming majority of that is deep links to pages within the site.

So it’s absolutely imperative that search engines be able to access your site, and this isn’t just keywords on your home page: Google must be able to get at every single page of the site, every nook and cranny, and see every little bit of information. A site that can’t be indexed is throwing away up to 90% of its audience. In other words, this traffic is lost by sites that are not search-accessible. And there’s an interesting word in that phrase.

Search Accessibility: because you’d be an idiot not to

Here’s the final dirty little secret of this situation: Google is a disabled user. Or more accurately, Google has all the same limitations of somebody using assistive technologies:

  • It doesn’t look at pictures
  • It does not execute any Javascript
  • It isn’t reading every bit of text on your page; it is looking for the important bits

Suddenly, the equation changes: at least 55% of your users need your site to be accessible, and possibly over 90% do. Only 10-20% of them need it to be accessible all the time, but that doesn’t matter, because up to 90% of your users will never even visit your site if it isn’t search accessible. This isn’t out of solidarity, or legislation. They simply won’t find it. Search accessibility is not an optional component, to be bolted on after the main launch. Chances are, if you haven’t got your search accessibility right, there will never be a second launch, because your site will fail.

How can I further underline the importance of search accessibility to a web-based business? Let’s turn the numbers around: you can more than double traffic to your website by making it search accessible. Does that sound like something you could take to your manager as a business case? Keep in mind, 50% traffic from search engines is an absolute minimum. If you’re getting 90% of your traffic from Google, then making yourself search accessible will result in a tenfold increase in traffic. Those sorts of numbers are why SEO is now big business, with a whole industry built around paying consultants to tell you how to get it right. That industry wouldn’t exist if they weren’t getting results.

But you don’t need to pay somebody. Once you’ve got the big, obvious business case out of the way, and swallowed the bitter pill that doing things properly will take 20% longer, search accessibility is super-easy. For my own personal how-to for search accessibility, see my next post.

* For example, if you’ve put a lot of business logic into JavaScript to enable Ajax goodness, making it accessible often means moving this logic to the server-side, which means reimplementing in a different language entirely, which is terribly expensive. You can write Ajax accessibly, so that business logic is always on the server and Ajax is merely a bridge, but you have to be thinking about it from the start. And as we’ve already established, you didn’t do that.

** This figure is affected by the type of site, and the levels of traffic to that site. So your blog might get half its traffic from regular readers, but on an e-commerce site the figure is going to be 90%.


Why use PHP frameworks?

Posted: February 26th, 2007 | Author: Laurie | Filed under: frameworks, php, webdev | Comments Off

Once upon a time, Perl was (and in fact remains) a perfectly capable language for writing web applications. But capable is not the same as suitable: it just wasn’t as good a choice for web applications as PHP, because even in version 2.0 of PHP you could do all the same things by using built-in functions, and people recognized that these things were faster and more reliable than building them themselves. The savings produced by not reinventing the wheel outweighed even the problems of switching languages.

Frameworks — like Ruby on Rails, and a raft of emerging PHP on Rails MVC frameworks, of which my favourite is CodeIgniter — are just the next generation of this principle. Where once we marvelled at how easy PHP makes it to query a database (only 5 lines of code!), now we can marvel at how easy a framework makes it (after you’ve set up your framework, only one line of code!).

The “I don’t need all that” trap

A common issue experienced coders run into when they look at frameworks is that they will say “this framework is too heavy, I don’t need any of that stuff”. This is particularly likely if what they’re building is supposed to be an experiment or a “prototype”. Being too heavy is of course a perfectly valid criticism of some frameworks, depending on the nature of your web application. But to use the same excuse to brush off frameworks in general is dangerous, and the “prototype” excuse even more so.

You don’t need the overhead of a framework to build your single-page blog software: often, doing it from scratch would be shorter and possibly even faster. This is, in fact, a problem with the overly simple demos the frameworks often promote via screencasts to demonstrate their capabilities. What you need a framework for is when your application becomes more than a single page, and when there’s more than one developer working on it.

This is why the “prototype” argument is also a false one: when’s the last time you actually threw away a prototype? You build the base, it works, so you tell everyone in the office about it. So you add a few extra features, tidy things up, eventually things snowball and the whole thing goes into production. And by the time you do that, you have to start maintaining it, and you should have used a framework.

The reason you need frameworks is because there’s no such thing as a small application. There’s just baby applications, which like all babies are small, simple and cute, and old applications, which are bigger, uglier, and frequently stink.

The real benefit of a framework is not in the screencast

There are two main branches of benefits of using frameworks:

  1. First-write functionality: by not reinventing the wheel, you develop faster
  2. Re-write functionality: having code that fits a standard pattern eases debugging, maintainability, and portability of code from one developer’s brain to the next.

The first benefit is the one the frameworks advertise in their screencasts, and often the area where they concentrate their further development efforts, building up complex AJAX and other functionality*. But it’s my strongly-held conviction that the second benefit is by far the greater: however, it’s achieved pretty much as soon as the framework is created, so it’s difficult for the developers working on the framework to remained excited about it. Fundamentally, significantly reduced maintenance cycles aren’t sexy, but they are useful as all hell.

Maintainability remains an enormous stumbling block in web development. It’s easy to write a monolothic, procedural script that handles all your data capture, validation, processing and output. Once you keep your internal model of that script in your head, it’s even relatively easy to maintain. The problem turns up six months later: you’ve forgotten how you wrote the script, or worse, you’ve had to pass the code on to another developer, and they can’t make head or tail of it.

A framework gives you built-in breakpoints for effective debugging: if the return statement in your model code is returning accurate data, you know for sure the problem is presentational, and vice versa. The nature of framework URLs makes tracing execution whole orders easier: you know automatically which function is being called in which controller, just from the URL. As your project begins to grow past the capabilities of a single developer, these features become essential, since your team members will be working with code they didn’t write. Frameworks by their nature provide your team-mates with a lot more information about function X than your average function name.

Speaking of which, some will say: what about naming conventions? Surely a framework is just a really elaborate naming convention, involving whole directories and files rather than just function names?

Not really. It’s a naming convention, but more importantly a coding convention: we don’t just specify what the function does, we specify where whole classes of related functions go, and the nature in which they interact. This provides those debugging benefits I mentioned earlier, and MVC also provides a structure that scales easily to applications with dozens of models and hundreds of views: modern web applications.

Maturity, at last

Frameworks provide something that has never before existed in the web development field: a convention that exists in more than one company. The more we as an industry use frameworks, the great the network effect: it means code and debugging can work effectively across companies, that new hires will be able to quickly understand the operation of your software and get productive faster.

Frameworks are a sign of a new maturity in the field of web development, a side-effect of the shift from writing “pages” to “sites” to “applications”. And it’s about time.

* I have yet to see a framework build AJAX that meets a high standard of web development. It doesn’t count as separating behaviour from content if your view has to make a bunch of ajax-specific calls and you end up with a bunch of inline onclick handlers.
** Unless you write consistent, comprehensive and up-to-the-minute documentation, of course. But nobody ever has. No, not even that guy.