On Apple, the iPhone, and web apps
Posted: April 12th, 2010 | Author: Laurie | Filed under: apple, iphone, webdev | No Comments »I’ve written about Apple’s ban on intermediate platforms, and what this means for web apps over on my main blog.
I’ve written about Apple’s ban on intermediate platforms, and what this means for web apps over on my main blog.
I got this when trying to install the ruby thrift client. It’s trying to compile a native extension, which needs ruby 1.8 dev version. Simple fix:
sudo apt-get install ruby1.8-dev
I suck at VCS. There, I’ve said it. I love the concept, I can see why it’s useful, I require no persuasion that I should be doing all my feature development in hundreds of little branches and pushing them back and forth at will. However, when I attempt to put those plans into action, I routinely fuck everything up and end up deleting long lines of “>>>>>” and accidentally breaking shit. It’s embarrassing.
So I’m determined not to do this with git. I will master how branching and merging works in git. And, because I drink too much caffeine and can therefore never remember anything, I will write it down in terms the developmentally-challenged could understand, because that’s the kind of documentation that works for me. And maybe it will work for you, too.
I’m writing this down as I go. When I fuck stuff up, I will leave it in, on the basis that finding out why stuff doesn’t work is at least as instructive as a perfect happy-path demonstration of everything working. This is a work in progress; I will add new things when I learn how to do them.
I’m going to assume you created a repo on github, using the GUI. Because that’s what I did. I’ve called it “git-training”. It won’t matter for a while though. For now, on your local machine, run:
> mkdir git-training
> cd git-training
> git init
You’ve created a directory and told git to treat it as the root of a repository. That’s all. Nothing’s in it yet, and it’s not connected to anything. The beauty of git is that you can start committing before needing to do any of that — in fact, if you felt like it, you could go on forever without pushing to github. That’s the “distributed” part of git. Git treats the local repo as authoritative and the server as optional. You can see what’s going on by typing
> git status
# On branch master
#
# Initial commit
#
nothing to commit (create/copy files and use "git add" to track)
As you can see, nothing’s happened yet, and it’s suggesting your next step. It also says you’re on branch “master”, which is the default for the initial branch and is not magical in any way. It’s called “master”, but it could be called “walrus”. Don’t get hung up about that. So let’s do what it suggested, and create and add a file:
> nano README
(Or vim or whatever. I use nano because it’s the editor that doesn’t require me to know random key combinations in advance. Stop judging me.)
Put something in the file. Github likes READMEs.
> git add README
Git now knows about this file, and will track what’s going on with it.
> git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached
#
# new file: README
#
“Initial commit” means it knows you’ve never committed anything before. It’s listing the files you could commit. There’s one, and it knows about it already because you added it. Let’s see what happens if you create a second file, without adding it:
> nano secondfile.txt
>> and, you know, put something in it.
> git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached
#
# new file: README
#
# Untracked files:
# (use "git add
#
# secondfile.txt
Now there’s a difference between these files: it can see both, but only one is set up to be committed. Let’s see what happens if you try to commit:
> git commit -m "first commit"
Created initial commit ef437ec: first commit
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 README
Note that you’ve not said what file to commit, so it tries to commit everything tracked — so README, but not secondfile.txt. It doesn’t care that you’ve not told it about secondfile.txt. It will continue to ignore it forever.
The random string is a unique ID for the commit. The file-changes stuff is self-explanatory, and I dunno what the rest means, frankly. Let’s find out where this has left us:
> git status
# On branch master
# Untracked files:
# (use "git add
#
# secondfile.txt
nothing added to commit but untracked files present (use "git add" to track)
There’s secondfile.txt, gleefully ignored. status won’t tell us about README, because it’s already committed as far as local git is concerned: local git considers itself just as authoritative as the upstream server — which, at this point, it hasn’t even heard about.
So let’s take our local master and push it to github.
> git remote add origin git@github.com:seldo/git-training.git
This is quick, because all you’re doing is telling local git that the remote git repository exists (git@github.com is the standard username shared by everybody, seldo is my personal namespace within github, and git-training is my specific repository). You’ve also given it a name: “origin”. Again, nothing magical about this name, it’s just a convention. Call it “hippo” if that floats your boat.
Now we want to push the local repository to the remote one. To do this, you’re going to need github to know about your local SSH key. Github has really great instructions on how to do this, and will automatically send you to the ones for your own operating system. So go there and do that, and then come back.
> git push origin master
When I first started this confused the hell out of me, but since I’ve given you some background it should be obvious what you’re doing. The first argument to push is the destination (despite the name) and the second is the source. You’re saying “push from branch ‘master’ to remote repo ‘origin’”. The output looks like this:
Counting objects: 3, done.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 268 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
* [new branch] master -> master
Again, I’d be lying if I said I knew what all of this crap means, but there you go: it’s on github now, in a branch called “master” (remember, “origin” is just your local label for the remote repository, not a branch name).
Woo! You’ve got stuff into the repository. Go you. But don’t stop here. There’s more to learn. If you’re ready to start coding properly, then go to “creating a feature branch”. If you’re trying to hack on something that already exists, then you want to read this next section.
Say somebody else did all the above for you already. You need to work on a copy of the code, like RIGHT NOW. You don’t want to look like an idiot. Get started! On a fresh machine (or in a subdirectory somewhere away from the one where your current git-training directory is), just run:
> git clone git@github.com:seldo/git-training.git
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2)remote: , done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.
Woo! Now a directory called git-training, and inside of it is the README file you pushed up earlier.
In my own testing, I put this inside a new directory called “son-of-git”, so that I could have both on the same machine without conflicting.
Hooray! Now you’re ready to edit code!
So you’ve either created or cloned a repository, and now you want to edit code. Top tip: don’t do it in the master branch. You want your master branch to be pure and full of final, production-ready code at all times. At the same time, you don’t want to leave more than a few hours between committing changes you’re writing to your feature. So create a branch, and commit to it regularly, and make sure everybody else does too. If you need to collaborate with somebody, don’t share a branch — instead, just pull from their branch into yours.
So first, let’s look at where we are:
> git branch
* master
Just one branch, master, and it’s got the * to tell you it’s selected. Let’s create and switch to a new branch:
> git branch feature1
This creates the branch. Dunno why they couldn’t have thrown the word “create” in there, especially since “git branch” by itself lists branches, but hey, that’s command-line software for you. Rationality is for wimps.
> git checkout feature1
This switches to the branch. Again, not a very intuitive command name.
> git branch
* feature1
master
Woot. Two branches listed, and we’re in feature1 (the star again). Let’s do some changes.
> mkdir feature1files
> cd feature1files
> nano file1.txt
>> insert "I am file 1, version 1"
> nano file2.txt
>> insert "I am file 2, version 1"
Where has this left us?
> cd ..
> git status
# On branch feature1
# Untracked files:
# (use "git add
#
# feature1files/
Okay! Git has spotted the new folder. We need to add the files. You can go about this in two ways. Either tell it about each file specifically (you don’t need to add the parent directory first):
> git add feature1files/file1.txt
> git status
# On branch feature1
# Changes to be committed:
# (use "git reset HEAD
#
# new file: feature1files/file1.txt
#
# Untracked files:
# (use "git add
#
# feature1files/file2.txt
Or you can add an entire directory at once, and git is smart enough to get everything in it:
> git add feature1files
> git status
# On branch feature1
# Changes to be committed:
# (use "git reset HEAD
#
# new file: feature1files/file1.txt
# new file: feature1files/file2.txt
Now you’re done for the day. Don’t hesitate: commit that shit!
> git commit -m "I worked on feature 1"
Created commit 595dc85: I worked on feature 1
2 files changed, 2 insertions(+), 0 deletions(-)
create mode 100644 feature1files/file1.txt
create mode 100644 feature1files/file2.txt
Again, this is super-quick, because it’s just talking to the local git repository, not the remote one. If you look on github right now, nothing has changed, because you haven’t pushed yet. Let’s try that now:
> git push origin master
Everything up-to-date
Wait, why did that happen? Look at what you asked it to do: “push from branch ‘master’ to remote repo ‘origin’”. And of course, on the branch “master”, nothing has changed — you’re in the branch called “feature1″. What you want to do is push from “feature1″ to the remote repo, so instead you need to do:
> git push origin feature1
Counting objects: 6, done.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 393 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
595dc85..33c25f2 feature1 -> feature1
So now go look at github. You’ll see instead of 1 branch, it now lists 2: master and feature1. Your feature is up in the cloud, safe and sound, and somebody else could check it out and use that feature in their own dev box. But you’ve not buggered up the master. Amazing!
The ability to push changes from any branch no matter what branch you’re in is one of the amazing parts of git, but also to a subversion or CVS user, one of the most confusing.
Even more amazing, you can now switch to another branch, instantly. Try it:
> git checkout master
Do an ls and suddenly, no directory called feature1files — but all is well!
> git checkout feature1
Aaaand it’s back. No muss, no fuss, and no talking to the server. Remember, the local git repository is the authority. It knows about all the branches you’ve dealt with and their current state. It only ever needs to talk to the server to get stuff other people have done, or to push your changes there. Again, this is a big mental adjustment you need to make.
To emphasize this, let’s do something that, if you’re used to centralized version control, seems a little wacky: switch between branches without telling the server about new files. First create it:
> cd feature1files
> nano file3.txt
>> and put something in it
Without adding it, let’s switch to the master branch and see what happens:
> git checkout master
Directory “feature1files” is still with us, but only 1 file is in it: file3.txt. That’s because git doesn’t know what to do with file3, but it’s certain that file1 and file2 aren’t in master. So switch back to the branch:
> git checkout feature1
Add the file:
> git add file3.txt
At this point, you can still switch back and forth between branches and file3.txt will come with you. But commit the file:
> git commit file3.txt -m "added a new file"
Created commit 33c25f2: added a new file
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 feature1files/file3.txt
Again, the file3.txt isn’t necessary — git would have committed whatever you’d added or changed. Now if you switch:
> git checkout master
No feature1files directory! Because now git knows where everything is, and it’s not in master. If you look at github, you won’t see file3 there either — it’s inside git, locally. You can get back there quickly:
> git checkout feature1
Woo, everything is back!
Okay, enough messing about. You’re ready to merge your feature back into the master, because it’s ready for production. It’s pretty easy. First, get to the master branch:
> git checkout master
And merge in the feature:
> git merge feature1
Remember, you’re merging locally here. Nothing is happening to anything but your local copy, and git knows about all the versions of everything, so you can always undo this.
Now feature1files has turned up. As far as local git is concerned, you’re done. The branches are merged (unless you had some conflicts, which we’ll deal with later). You can switch to some new feature branch now if you want:
> git branch feature2
> git checkout feature2
This new branch already has feature1 in it. If you switch back to master, you’ll see a warning:
> git checkout master
Switched to branch "master"
Your branch is ahead of the tracked remote branch 'origin/master' by 3 commits
Git knows about the remote repo called “origin”, which has a branch called ‘master’. It’s letting you know you’ve not pushed there. Let’s do that now:
> git push origin master
Counting objects: 6, done.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 367 bytes, done.
Total 4 (delta 1), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
ef437ec..6ebc96e master -> master
Nice! A refresh over at the github site will now show you feature1 files in the master. But what if you wanted to do the opposite, and pull changes from the master to be merged into your local copy? That’s a section after this one, but first…
Remember back at the beginning, when we created the repo in the first place, either on another machine, or just some other directory? You’ve been happily building new features over in your cloned copy (“son of git”), and ignoring the original (let’s call it “git the first” to avoid future confusion). Time to catch up! Let’s try the dumb way first:
> git pull
remote: Counting objects: 14, done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 13 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (13/13), done.
From git@github.com:seldo/git-training
* [new branch] feature1 -> origin/feature1
ef437ec..6ebc96e master -> origin/master
You asked me to pull without telling me which branch you
want to merge with, and 'branch.master.merge' in
your configuration file does not tell me either. Please
name which branch you want to merge on the command line and
try again (e.g. 'git pull
Blah, blah, blah. All hell has broken loose. The reason is evident from the error: there’s more than one remote branch, and because there’s no magic involved in branch names, “master” is no more special than “feature1″. Git needs you to say which branch you want to pull, and where you want it pulled into. So let’s tell it:
> git pull origin master
From git@github.com:seldo/git-training
* branch master -> FETCH_HEAD
Updating ef437ec..6ebc96e
Fast forward
feature1files/file1.txt | 1 +
feature1files/file2.txt | 1 +
feature1files/file3.txt | 1 +
3 files changed, 3 insertions(+), 0 deletions(-)
create mode 100644 feature1files/file1.txt
create mode 100644 feature1files/file2.txt
create mode 100644 feature1files/file3.txt
This translates to “pull the branch ‘master’ from remote repo ‘origin’ into the local branch called ‘master’”, which it does with a minimum of fuss. Neat!
Now we can make a bugfix to our master copy, and even get around to adding that “secondfile.txt” we created way back in the day.
> nano feature1files/file1.txt
>> change "version 1" to "version 2"
> git add secondfile.txt
> git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD
#
# new file: secondfile.txt
#
# Changed but not updated:
# (use "git add
#
# modified: feature1files/file1.txt
#
Okay, that’s what we wanted to change, so let’s commit it:
> git commit -m "bugfix on master"
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 secondfile.txt
Hmmm, only one file? Let’s see what happened:
> git status
# On branch master
# Changed but not updated:
# (use "git add
#
# modified: feature1files/file1.txt
#
Hmmm, that’s not what we were expecting… file1.txt is already part of the repository, why didn’t our commit get it? Because git only commits what you’ve explicitly told it to commit, with “add”. Earlier, we added everything, so commit got everything. So let’s add file1 to this commit, and commit again:
> git add feature1files
> git commit -m "bugfix, round 2"
Created commit b645bd3: bugfix, round 2
1 files changed, 1 insertions(+), 1 deletions(-)
> git status
# On branch master
nothing to commit (working directory clean)
Woo! Now it’s all working. Just to make sure we understand how that works, let’s do it again. Change one file, and create another:
> nano secondfile.txt
>> append "version 2"
> nano thirdfile.txt
> git add secondfile.txt
> git commit -m "second file only"
Created commit a62725d: second file only
1 files changed, 1 insertions(+), 1 deletions(-)
> git status
# On branch master
# Untracked files:
# (use "git add
#
# thirdfile.txt
The change to secondfile went in; the change to thirdfile did not. Let’s push our change to secondfile up to the server, so we can use it in our next lesson:
> git push origin master
Counting objects: 13, done.
Compressing objects: 100% (7/7), done.
Writing objects: 100% (10/10), 1.01 KiB, done.
Total 10 (delta 1), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
6ebc96e..a62725d master -> master
And a quick peek at github’s master branch shows secondfile.txt has turned up.
Okay, so we’re back in Son Of Git, original home of feature1. Let’s see where we left off:
> git branch
feature1
feature2
* master
We’re in master, but that juicy feature2 over there is looking like fun. Let’s get over there:
> git checkout feature2
All is well, except we don’t have the latest bugfix we committed to the master — secondfile.txt is essential to our work! But feature2 isn’t ready yet. So instead, let’s merge the master into this feature branch, so we can use the bugfix:
> git merge master
Already up-to-date.
D’oh! Of course, that doesn’t work, because, remember, git always believes it is the authority. Master on this machine hasn’t changed since we created feature2, so it thinks everything is up to date. The solution is to switch back to master, and get it in sync with the server:
> git checkout master
> git pull origin master
remote: Counting objects: 13, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 10 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (10/10), done.
From git@github.com:seldo/git-training
* branch master -> FETCH_HEAD
Updating 6ebc96e..a62725d
Fast forward
feature1files/file1.txt | 2 +-
secondfile.txt | 1 +
2 files changed, 2 insertions(+), 1 deletions(-)
create mode 100644 secondfile.txt
Again, this means “pull the latest changes in the branch called master from the remote repo called origin to our local machine”. Nothing magic about the names. With that done, let’s switch back to feature2 and try that merge again:
> git checkout feature2
> git merge master
Updating 6ebc96e..a62725d
Fast forward
feature1files/file1.txt | 2 +-
secondfile.txt | 1 +
2 files changed, 2 insertions(+), 1 deletions(-)
create mode 100644 secondfile.txt
Wahey! Feature2 is now in sync with the server and all is well. Amazing! Now let’s go for something genuinely tricky:
Augh, the dread! The horror! The merge conflicts! The best solution for conflicts is to never have them, or if you must have them, keep them small: everybody should work on isolated features, commit often, merge often, etc.. But it’s too late for that now. You’ve got a conflict. Time to fix it. First, let’s create the merge conflict. Over in “git the first”, edit secondfile:
> nano secondfile.txt
>> Prepend the line "I am the master title"
> git add secondfile.txt
> git commit -m "adding a title"
Created commit 97e64c8: adding a title
1 files changed, 2 insertions(+), 0 deletions(-)
> git push origin master
Counting objects: 5, done.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 309 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
a62725d..97e64c8 master -> master
So far so good. Now, over in “son of git”, do something suspiciously similar in your feature branch:
> git checkout feature2
> nano secondfile.txt
>> Prepend the line "I am the feature2 title"
> git add secondfile.txt
> git commit -m "feature2 has a title"
Now, go back to master and pull in the change from Git The First, so our local repository has a copy of that change.
> git checkout master
> git pull origin master
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From git@github.com:seldo/git-training
* branch master -> FETCH_HEAD
Updating a62725d..97e64c8
Fast forward
secondfile.txt | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
Remember, there’s no conflict here yet — your local master never changed, so all you did was pull in changes from the remote master. But now go over to your feature2 branch again:
> git checkout feature2
> git merge master
Auto-merged secondfile.txt
CONFLICT (content): Merge conflict in secondfile.txt
Automatic merge failed; fix conflicts and then commit the result.
Oh noes! You have the dreaded long lines of >>>>>s in secondfile.txt. No way around it: you edited the same line of the same file. Somebody has to win. Let’s make it be our feature. Edit the file:
> nano secondfile.txt
<<<<<<< HEAD:secondfile.txt
I am the feature2 title.
=======
I am the master title
>>>>>>> master:secondfile.txt
This is a second file, version 2
Here, because of the specific text we entered, it’s pretty clear what’s going on. The lower is what got pulled in from the master branch, the upper is what was in “HEAD”, i.e. the local branch. Delete the lines as appropriate, save, and then:
> git status
secondfile.txt: needs merge
# On branch feature2
# Changed but not updated:
# (use "git add
#
# unmerged: secondfile.txt
#
To tell git you’ve merged things, you need to treat it like a fresh commit. So add, then commit:
> git add secondfile.txt
> git commit -m "merge fix"
> git status
# On branch feature2
nothing to commit (working directory clean)
Nice. We can push this to the server (still on our feature branch) so other people can use it:
> git push origin feature2
Counting objects: 8, done.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 454 bytes, done.
Total 4 (delta 2), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
* [new branch] feature2 -> feature2
And back over on the master branch, we can pull in our new merged copy without fear, because we already handled the conflicts. So the new merge goes off without a hitch:
> git checkout master
Switched to branch "master"
Your branch is ahead of the tracked remote branch 'origin/master' by 4 commits.
> git merge feature2
Updating 97e64c8..a63623b
Fast forward
secondfile.txt | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
Now looking at secondfile.txt, you see it’s got the feature2 title. You’re merged, you can test and push back to the server:
> git push origin master
Total 0 (delta 0), reused 0 (delta 0)
To git@github.com:seldo/git-training.git
97e64c8..a63623b master -> master
If you look on github, you can now see secondfile.txt in the master branch, and it’s got the feature2 title in it. You’ll also notice the history doesn’t include the messy “merge fix” thing — that’s because, to the master branch the conflict never happened: the commit that fixed the conflict happened in the feature2 branch. The only thing that happened in the master branch is a trouble-free merge. Whee!
What about other types of fuck-ups? There are so many…
You added that file you didn’t mean to. Do’h! You can reset the list of added files like so:
> git reset
This will undo all your adds, so you’ll need to re-do the ones you meant to add.
You’ve edited a file, you don’t like it, you want it back the way it was. There’s two ways. For just the one file, try:
> git checkout HEAD fuckedfile.rb
Or, if you’ve messed up everything in the entire working tree, here’s the nuclear option:
> git reset --hard HEAD
This will throw away EVERYTHING you’ve changed but not committed, so beware.
This is really all you need to know to handle basic life in git. If you’ve got all this to work, you should understand what you’re doing. But I’ll add new stuff that’s more complicated when I find it.
When setting up a Facebook Connect site recently, I kept running into a strange error in my Firebug console: “setting a property that has only a getter”. It turns out this error is caused by Firebug itself, and has nothing to do with FBML. Just close Firebug and your code will work fine.
A few days ago an Aardvark user asked me a really common question: how should I promote my new website?
The web is awash with people claiming to be “social media experts” and “SEO consultants”. Their advice in my experience ranges from the unhelpful through the inaccurate all the way to the positively counter-productive. Many of them are snake-oil merchants who want to charge thousands of dollars to dispense obvious advice or to use shady techniques that will get your site blacklisted by the search engines for spam.
So here is my own obvious advice, for free. My credentials are more than a decade’s experience building websites large and small and a working knowledge of modern search engines*, but much more importantly that, I am not trying to get you to pay me.
For the sake of this discussion I am assuming you are a low- or zero-budget site; if you have a lot of money just buy an ad during the superbowl.
In short, there are three really good ways to make your website popular:
Let’s tackle these in turn.
There are two facets to SEO: I have previously covered making it easy for the software used by search engines to understand your website. The other facet, and by far the more important one, is improving your inbound links.
By far the best way to get traffic to your site is to get other sites on the same topic to link to you. It gets traffic directly — by visitors on those sites clicking the links — and it also gets you placed higher in search results. No amount of tweaking and fiddling will get around this fundamental fact: Google (and all other major search engines) decides how relevant your site is by how many other people link to it, and how relevant they are. So the best place to promote your website is on other websites about the same topic. But it’s important to do this in a friendly, respectful and above all non-spammy way. “Non-spammy” is tricky, but some good guidelines are: not too often, not too fast, not to too many people at the same time, and not to an audience that won’t appreciate it.
Some forums also have a “signature” that you can append to your messages, and it is sometimes considered acceptable to include a plug to your own site in there — but be careful, as other sites will consider it spam. As usual, the key is spending some time and getting to know the culture of the site.
These are simple techniques, and require a lot of time and effort. However, they are also the most powerful. A single link from a popular site can mean thousands of visits, which could cost you hundreds of dollars if you were paying for ads.
The very phrase “social media optimization” reeks of sleaziness. Social media is much harder to get right, but also potentially more powerful: a single well-timed mention of a topical page on your website can go around the world, spreading virally to millions of people in just a few hours.
The two biggest social media channels are Facebook and Twitter, but these techniques can be adapted for almost any website where people have profiles and send messages to each other.
Create a Fan Page for your site. Use it to describe your site, and to mention useful, relevant content from your site. Encourage your existing users to add themselves as fans: that event turns up in their news feeds, which their friends see, and is a great way to spread the word about your site virally. People tend to have friends with similar interests, so the rate of response to a mention in a news feed is much better than an ad, no matter how well targeted.
You can use your fan page to send broadcast messages to your fans (though these are often ignored if they are too frequent, so use them sparingly). You can also post messages to your Wall, which is a great way to keep your page fresh and interesting to new fans.
Create a profile for your site. As with your Facebook Wall, tweet useful information about your topic. Uniquely to Twitter, you can use the ad-hoc conversational dynamic to your advantage: search for your topic, and you will find people talking about it. If you have something useful to add — and only then — you can reply to them publicly (an “@reply”). If they like your tweet, they’ll follow through to your profile and find your site. If it’s particularly helpful, informative or witty, they may even re-tweet your message, sending it out to all their followers.
@replies are really powerful, but again, don’t be spammy: don’t reply to everyone with the same generic tweet, and don’t blindly reply to anybody who mentions your keyword — make sure they’re actually interested in your topic.
Avoid viral gimmicks like “Retweet this to win prizes!” These get you a whole lot of publicity, but at much lower quality: these people aren’t genuinely interested in your topic, they just want your free iPod. You also run the risk of annoying potential users who receive the same tweet from ten different people.
And finally, the most important and powerful technique of all.
One of the oldest maxims of website promotion is that content is king. The Internet, in all its marvellous chaotic complexity, is really good at surfacing quality content. So make sure your site is well-written, easy to read, and frequently updated. The best way to get popular is to be genuinely useful, informative, or entertaining, and no amount of futzing around with social media is going to get around that
It also helps if your site has a reasonably narrow, defined topic or range of topics — if it’s “my blog about everything” it’s going to be difficult to get anybody but your closest friends to read it, no matter what you do. Work out who you want your audience to be, and then come up with the most useful content you can possibly think of for them. They will reward you with real, long-last reputation and traffic.
* I work for Yahoo!, but not for the search division. I make no claims to any special knowledge about the mechanics of Yahoo search. Even if I did have any knowledge to that effect, sharing it in on a public blog would be severely career-limiting.
In my last post, I covered the politics of search accessibility, and why making your site available to all users is above all the profitable thing to do, without considering whether it’s the right thing. So now I’m going to cover how to make your site search accessible.
The program that runs around the Internet reading every single page and throwing it into Google’s* giant database is GoogleBot (Yahoo!’s is called Slurp). GoogleBot is your best friend, your worst enemy, your teddy bear and your mommy all rolled into one. GoogleBot is a very, very clever piece of software, but it’s not magical. Here is what GoogleBot does:
Key take-home: it’s all about keywords and links. It is all about text. Attractive design and a witty site slogan and pictures of bikini models holding your product count for naught. As I mentioned in my last post, Google is in effect a disabled user using only the most basic of assistive technologies:
Already, some of the key things you need to do for SEO are obvious, in order of importance:
It’s important to know what Google considers an “important” word. Google is more than a little secretive about this, but Google has its own guidelines for site design and professional, non-evil SEO people have their own search accessibility guidelines. My own, very subjective impression from several years of experience, is that the most important words on your page are:
Another aspect of your page that is extremely important to Google is source code order: literally, the order things appear in your source. Things that appear early on are likely to be more important than things that appear later. That seems obvious, right? But now look at your code: you’ve got the head, full of juicy meta data, and then you’ve got 5k of navigational elements, sidebar text, various other cruft, just placed first because you were using a left-floated column and so it was easier to put it there. This is killing you.
What’s much worse is when your source code order physically separates content that is semantically related: for instance, your headline is at the top of your page, then you have 5k of navigational cruft, then you have your content. Google will either fail to realise that your headline is describing your content, and thus not link the words, or worse, it will decide that your page doesn’t actually have any content on it relating to your headline, and you’re trying to spam it. Danger, Will Robinson!
And of course the number one offender from this perspective is using tables for layouts. If you care about web development, you’re probably aware that tables have serious issues with flexible, attractive layouts. However, that’s usually not a good enough reason to take to your boss: after all, it doesn’t bother her that your job is hard. However, tell her that using tables is causing an 80% drop in traffic to your site (as I explained in the last post) and suddenly you have an easy, obvious business case for reworking the layout of your code.
Tables put data into grid layout. If your data is in columns — and it frequently is, this means you often end up with a site code layout that looks like this:
Site logo Article headline
- List
- of
- nav
- links
Article body
To Google this reads like:
So you can see why Google might get confused. So examine your code, and put things in the order of importance: you can use CSS to move stuff around on the page later. Coincidentally, source code order is also the order in which screen readers will read out your page to a blind user. So once again there’s a useful coincidence of making your site accessibile when you make it search accessible.
Of course — and I would have thought this was obvious, but I get questions about it that indicate to the contrary — you can use tables when the data is tabular. Don’t try to mark up your spreadsheet data using a series of stacked lists. Tables have real semantic meaning, but it has been diluted almost beyond help by their consistent misuse.
There is more I could tell you about SEO — the various hazily-defined statistical rules about how many links on a page is too many, optimal keyword density, and more, but these advanced techniques are icing on the cake, and the cake is made of search accessibility. It doesn’t matter what your keyword density is if Google can’t even get to your pages. So get out there and make the case for accessibility. And when the traffic is rolling in and your boss is giving you your huge bonus, you can get a tiny little extra bit of joy from knowing your site is also accessible to disabled users.
* When I say Google, obviously I mean Yahoo!, Ask and all the other major search engines as well. They all work the same way. If Google didn’t want me to use their name to mean all search engines, they shouldn’t have made it a verb.
** For important text like headlines, it’s often better to put the text into the page directly in a semantically-meaningful element (like H1, H2, etc), make the text transparent, and then put the nicely-styled image in as a background image. This makes no difference to what your users see but it makes the words look a lot more “important” to Google.
So I was at the @media America conference last week. There was much talk of accessibility and how to do it properly, when to do it, and even when not to do it. There was also talk about why to do it, but that’s where I think the speakers dropped the ball. Accessibility is not about helping disabled people: it’s about money, and you making more of it. (I’m going to use a lot of bold text in this post to emphasize stuff. That’s because it’s long, and you’re skim-reading. See, I know you.)
The dirty secret of accessibility, swept under the rug by many an evangelist, is that the cost of making your site accessible is relatively high: in my experience, something like 20% additional dev time on a new project, although experienced developers can bring this down, and the cost decreases dramatically for incremental updates once the project is up and running. But a 20% margin is definitely non-trivial. And if you’ve not been thinking in terms of accessibility from the start, this pricetag rises sharply: retrofitting accessibility often involves fundamentally reworking the architecture of a web page*. You’ll be looking at spending something like 50% of the time you spent originally developing the site on the retrofit. Ouch.
The other dirty secret of accessibility is that the number of disabled users is relatively low. Not tiny, but I often hear figures like “60% of Americans are disabled”, and while this is true, it’s disingenuous because that figure can include people like amputees or paraplegics who can use the web with no problems whatsoever. The truth is that somewhere between 10% and a maximum of 20% of your users will have trouble using your site without assistive technologies. This makes it a very close call, when starting a new project — serving 80% of your possible users doesn’t seem ideal, but is an acceptable loss to get it out of the door 20% faster, right? You can build the accessibility in later!
Except you can’t. After launch, you’ve got an inaccessible site and you’re facing a 50% dev time bill to retrofit that acessibility in: another 3 weeks on what was a 6-week project, just to get 20% more users? That makes no business sense: much better to build another project, and get another 80% of users in the door quickly.
This is the kind of unavoidable math that has made the web inaccessible today. And that’s the harsh truth: building in accessibility for disabled users does not make business sense. It’s still a good idea, a noble idea, but it’s not a financially sound one. This is true in the real world, too, which is why legislation was necessary to force everybody to put accessible toilets and wheelchairs in everywhere.
Of course, legislators have (eventually) worked out this problem, and as such there is already web accessibility legislation in place in many countries that makes it illegal to produce an inaccessible website. Problem solved! It’s the law! We have to do it! Right?
In an ideal world, yes. In the real world, the law is only patchily enforced. Only a few very large, very high-profile sites have been sued so far (plus some government sites). You can always fly under the radar, hope nobody notices, and not build in accessibility until they sue you. It’s a good gamble to make to avoid increasing the cost of your site by 50%, right? Again, the math defeats us.
But this is all very unsatisfying. You, the clever, compassionate, standards-compliant modern web developer, feel that this cold logic is intrinsically, morally wrong. So you make the case for accessibility: you try to inflate disabled user numbers (counterproductive; it will make your manager trust you less) and deflate the amount of time it will take to make it accessible (an even worse idea; now you’re missing deadlines because of “that damn accessibility stuff”, making your manager hate the whole idea).
So here’s how you, as a developer, can stay true to your noble impulses to build an elegant, accessible website: stop calling it accessibility.
Search Engine Optimization, or SEO, is the hot shit right now. Google is the Internet for a lot of people, and if Google can’t find it, then it doesn’t exist. Nobody goes deep-diving on a site to try and dig up information anymore. Either they type in their search terms and your site comes up with exactly what they need on that page, or they will never click your link. Sites these days get 50-90% of their traffic from search engines**, and the overwhelming majority of that is deep links to pages within the site.
So it’s absolutely imperative that search engines be able to access your site, and this isn’t just keywords on your home page: Google must be able to get at every single page of the site, every nook and cranny, and see every little bit of information. A site that can’t be indexed is throwing away up to 90% of its audience. In other words, this traffic is lost by sites that are not search-accessible. And there’s an interesting word in that phrase.
Here’s the final dirty little secret of this situation: Google is a disabled user. Or more accurately, Google has all the same limitations of somebody using assistive technologies:
Suddenly, the equation changes: at least 55% of your users need your site to be accessible, and possibly over 90% do. Only 10-20% of them need it to be accessible all the time, but that doesn’t matter, because up to 90% of your users will never even visit your site if it isn’t search accessible. This isn’t out of solidarity, or legislation. They simply won’t find it. Search accessibility is not an optional component, to be bolted on after the main launch. Chances are, if you haven’t got your search accessibility right, there will never be a second launch, because your site will fail.
How can I further underline the importance of search accessibility to a web-based business? Let’s turn the numbers around: you can more than double traffic to your website by making it search accessible. Does that sound like something you could take to your manager as a business case? Keep in mind, 50% traffic from search engines is an absolute minimum. If you’re getting 90% of your traffic from Google, then making yourself search accessible will result in a tenfold increase in traffic. Those sorts of numbers are why SEO is now big business, with a whole industry built around paying consultants to tell you how to get it right. That industry wouldn’t exist if they weren’t getting results.
But you don’t need to pay somebody. Once you’ve got the big, obvious business case out of the way, and swallowed the bitter pill that doing things properly will take 20% longer, search accessibility is super-easy. For my own personal how-to for search accessibility, see my next post.
* For example, if you’ve put a lot of business logic into JavaScript to enable Ajax goodness, making it accessible often means moving this logic to the server-side, which means reimplementing in a different language entirely, which is terribly expensive. You can write Ajax accessibly, so that business logic is always on the server and Ajax is merely a bridge, but you have to be thinking about it from the start. And as we’ve already established, you didn’t do that.
** This figure is affected by the type of site, and the levels of traffic to that site. So your blog might get half its traffic from regular readers, but on an e-commerce site the figure is going to be 90%.
So you’ve heard about Konfabulator, that fabulous little engine running at the heart of Yahoo! Widgets. The Widgets are cute, and it all looks pretty simple to do. But why build one? Well, the honest answer is that quite often, you don’t. There are lots of times when making a web site instead would be quicker, easier and simpler. Widgets are not a web replacement. But there are lots of things that Widgets provide that a web page doesn’t:
Note what I said just now: applications, not toys or gimmicks. Widgets are not a web replacement. They’re not web 3.0. They are a completely new breed of desktop application: a term we somebody threw out in the office as a joke was Desktop 2.0, but that’s really what we’re talking about. Persistent, performant, web-connected, and a damn sight easier to build than Java or C++ or any other desktop-application development language you care to name.
Widgets are the natural result of the explosion in the number of developers with web-centric skills since the late 90s. Any development environment that attracts large numbers of developers produces a push to increase the capabilities of that environment into other areas: this is why there is command-line PHP, GUIs written in Python and web apps written in VBScript. So it was inevitable that JavaScript would migrate to the desktop, and taking a bunch of other web-like technologies like XmlHttpRequest and the DOM with it is good and important too, because Widgets aren’t just a different language for writing applications, they’re a new way of writing applications.
The thing about Widgets that endears them to me as a development platform is that they get software development the right way around: build the interface first, and hook the functionality into that. This is such a simple idea, and yet there’s no desktop development language that works this way: yes, there are visual ediitors for C++ and Java, but they are just working with a set toolkit: the code is actually central, and if you want to do something out of the ordinary with the interface, you have to go back to the code. Because the truth is that for the vast majority of applications, the processing that’s going on is minor, and all the value is in the interface: presenting the right information at the right time, in the right way. Widgets let you concentrate on that.
Of course the problems. JavaScript is a lot faster in Konfabulator than it is in any browser, but JavaScript is still an interpreted language, and that has performance implications. There are other problems, too: the documentation (a personal bugbear and current project of mine) still has some way to go before it’s up to the standards of developer.mozilla and PHP.net. You are still insulated from the operating system to a certain extent. And despite our best efforts, there will inevitably be some bugs.
Widgets are not the be all and end all. They do not replace desktop applications, nor do they replace web applications. Instead they complement both, and produce a useful middle ground, where a web dev like you can create something that looks and works like a desktop application.
Once upon a time, Perl was (and in fact remains) a perfectly capable language for writing web applications. But capable is not the same as suitable: it just wasn’t as good a choice for web applications as PHP, because even in version 2.0 of PHP you could do all the same things by using built-in functions, and people recognized that these things were faster and more reliable than building them themselves. The savings produced by not reinventing the wheel outweighed even the problems of switching languages.
Frameworks — like Ruby on Rails, and a raft of emerging PHP on Rails MVC frameworks, of which my favourite is CodeIgniter — are just the next generation of this principle. Where once we marvelled at how easy PHP makes it to query a database (only 5 lines of code!), now we can marvel at how easy a framework makes it (after you’ve set up your framework, only one line of code!).
A common issue experienced coders run into when they look at frameworks is that they will say “this framework is too heavy, I don’t need any of that stuff”. This is particularly likely if what they’re building is supposed to be an experiment or a “prototype”. Being too heavy is of course a perfectly valid criticism of some frameworks, depending on the nature of your web application. But to use the same excuse to brush off frameworks in general is dangerous, and the “prototype” excuse even more so.
You don’t need the overhead of a framework to build your single-page blog software: often, doing it from scratch would be shorter and possibly even faster. This is, in fact, a problem with the overly simple demos the frameworks often promote via screencasts to demonstrate their capabilities. What you need a framework for is when your application becomes more than a single page, and when there’s more than one developer working on it.
This is why the “prototype” argument is also a false one: when’s the last time you actually threw away a prototype? You build the base, it works, so you tell everyone in the office about it. So you add a few extra features, tidy things up, eventually things snowball and the whole thing goes into production. And by the time you do that, you have to start maintaining it, and you should have used a framework.
The reason you need frameworks is because there’s no such thing as a small application. There’s just baby applications, which like all babies are small, simple and cute, and old applications, which are bigger, uglier, and frequently stink.
There are two main branches of benefits of using frameworks:
The first benefit is the one the frameworks advertise in their screencasts, and often the area where they concentrate their further development efforts, building up complex AJAX and other functionality*. But it’s my strongly-held conviction that the second benefit is by far the greater: however, it’s achieved pretty much as soon as the framework is created, so it’s difficult for the developers working on the framework to remained excited about it. Fundamentally, significantly reduced maintenance cycles aren’t sexy, but they are useful as all hell.
Maintainability remains an enormous stumbling block in web development. It’s easy to write a monolothic, procedural script that handles all your data capture, validation, processing and output. Once you keep your internal model of that script in your head, it’s even relatively easy to maintain. The problem turns up six months later: you’ve forgotten how you wrote the script, or worse, you’ve had to pass the code on to another developer, and they can’t make head or tail of it.
A framework gives you built-in breakpoints for effective debugging: if the return statement in your model code is returning accurate data, you know for sure the problem is presentational, and vice versa. The nature of framework URLs makes tracing execution whole orders easier: you know automatically which function is being called in which controller, just from the URL. As your project begins to grow past the capabilities of a single developer, these features become essential, since your team members will be working with code they didn’t write. Frameworks by their nature provide your team-mates with a lot more information about function X than your average function name.
Speaking of which, some will say: what about naming conventions? Surely a framework is just a really elaborate naming convention, involving whole directories and files rather than just function names?
Not really. It’s a naming convention, but more importantly a coding convention: we don’t just specify what the function does, we specify where whole classes of related functions go, and the nature in which they interact. This provides those debugging benefits I mentioned earlier, and MVC also provides a structure that scales easily to applications with dozens of models and hundreds of views: modern web applications.
Frameworks provide something that has never before existed in the web development field: a convention that exists in more than one company. The more we as an industry use frameworks, the great the network effect: it means code and debugging can work effectively across companies, that new hires will be able to quickly understand the operation of your software and get productive faster.
Frameworks are a sign of a new maturity in the field of web development, a side-effect of the shift from writing “pages” to “sites” to “applications”. And it’s about time.
* I have yet to see a framework build AJAX that meets a high standard of web development. It doesn’t count as separating behaviour from content if your view has to make a bunch of ajax-specific calls and you end up with a bunch of inline onclick handlers.
** Unless you write consistent, comprehensive and up-to-the-minute documentation, of course. But nobody ever has. No, not even that guy.
So it may strike some as odd to start a blog about web dev and Widget dev by naming it after a Patrick Wolf song, but that’s what I’ve done, so you’re all going to have to learn to live with it. I chose the name partially because web dev is all about position, but mainly because I really like the song. I could make up some bullshit about how Widgets are also related to position in some way, but I really think that would be reaching.
I’m writing the web dev stuff because that’s what I was and what I still do sometimes, and I’m writing the Widget dev stuff because that’s what I am now. So the web stuff will likely be relatively advanced, and the Widget stuff relatively simple. In particular I’m hoping to write a lot of the Widget stuff from the perspective of a complete beginner, before that feeling wears off for me.
I’m aiming for one post a week. Let’s see how we go, shall we?