Ember.js, Google Ajax crawling and Rails

It is general knowledge that Ajax websites aren’t easy to crawl for search engines. When only small parts of a page are updated with Javascript it might not be a big deal, but when you create your entire website with a framework like Ember.js it is a problem when all the crawlers see are empty webpages.

Luckily, Google has published a specification of how you should make your Ajax webpages crawlable. I do not know whether other search engines besides Google support this specification.

In this article I will show how you can conform to this specification using Ember.js and Rails as the backend.

You can find the full source of this example on Github.

The example

The sample application I created is called ‘ItemApp’ and has a single ‘Item’ model and the associated controller. I’m not going to explain how I created it, but it should be obvious when you inspect the source.

Noteably, I put my own ember.js, ember-data.js and handlebars-1.0.rc.1.js files in vendor/assets/javascripts in addition to adding the ember-rails gem to the Gemfile. By adding the js files yourself, you have more control over the version of Ember you’re running. Also, at the current pace of Ember development the gem doesn’t always have the newest version. These files have been compiled on January 10, 2013 from the master branch of ember and ember-data respectively.

The reason for including the ember-rails gem is that it nicely compiles the handlebars templates. I added these lines to application.rb to configure the gem.

# Specify which variant (development: not minified) of Ember to use
# Line is added to silence warnings: using own .js files in vendor/assets
# This allows more control and currently more up-to-date files as Ember.js is changing so rapidly
config.ember.variant = :production
# Make handlebar template names cleaner by removing the app/templates prefix
config.handlebars.templates_root = 'app/templates'

The first statement is not really necessary as we roll our own ember.js file. However, when not present the gem tends to complain. The second line makes template names a bit shorter.

Also, I used the Ember router v2.

Side note: serializers

Ember downloads JSON representations of your models. Rails offers some basic functionality out of the box.

render json: @items

This will render the array with items with all their properties. This includes things like created_at and updated_at which you might not want to be sent to the user. It is not easy to hide those columns (without cluttering up your controller).

Luckily, there is ActiveModel::Serializer, which offers a lot more control and is really easy to use.

As an extra benefit, you don’t have to write

render json: {item: @item}

anymore. You had to previously as Ember expects a root node and Rails doesn’t add one for single models. The serializer takes care of this so you can just write

render json: @item

Hash fragments

Google’s specification states that paths should be prepended with ‘#!’. By default Ember uses just a hash ‘#’ and it has no built in method to automatically add the exclamation mark. You could add this to every route you specify, which would work, but would not look very nice and would be hard to change later.

Instead, the router has a location property which specifies which location ‘engine’ it should use. Out of the box it has ‘hash’, ‘history’ and ‘none’. We’re going to add ‘hash_exclamation’ to that list.

I took the source of the ‘hash’ and modified the methods a bit. (I removed the comments to make it more compact. Comments are present in the source file on Github)

I added this file as lib/assets/javascripts/ember_hash_exclamation.js. This directory is not automatically imported, so you also have to add it in application.rb.

config.autoload_paths += %W(#{config.root}/lib)

After this, you can specify you want to use this type of hash like this:

App.Router = Ember.Router.extend({
  location: 'hash_exclamation'
});

Running javascript

As you can read in the specification, Google will pass the part after #! as an _escaped_fragment_ URL parameter and expects a fully rendered page in return. This is the hard part. You could rewrite your .handlebars templates as .html.erb and render the response from your controllers. But that will likely be a load of (boring!) work.

Instead, you can use a headless browser to run the Ember app and return the rendered page automatically. PhantomJS is such a headless browser and is available on all platforms. You can find installation instructions on their website.

PhantomJS lets you run Javascript scripts and I created a script that loads a specified page, runs the Javascript and then outputs the HTML via the console.

The script has a somewhat complicated method of detecting whether the page has finished running. By my knowledge there is no callback or event that indicates that a page has finished loading AND running. Therefore I let the script wait until all resources have loaded and then give the page a couple more seconds to finish rendering. This is very hacky. If you know a better way to do this, let me know! @pieter_jongsma

You can run the script like this (once you have PhantomJS installed)

phantomjs ember_to_static.js [url] [silence]

The silence parameter will suppress info messages.

Catching _escaped_fragment_

Now we need to instruct Rails to run this script whenever a search engine requests a page with _escaped_fragment_.

For this, I added a catch_escaped_fragement method as a before_filter method.

This method should be quite straightforward. When the method detects an _escaped_fragment_, it builds a new url where the _escaped_fragment_ is moved behind the #!. It then requests the page using the script above (which I placed in lib/phantomjs/ember_to_static.js). Because of the silence parameter, the script will output just the HTML of the fully rendered page and will then render this as the response.

Rails concurrency

You might think you’re done now (I did), but if you try to run this the requests will hang. This is because Rails can’t handle concurrent requests. In the catch_escaped_fragement method we request a new page from our server and we wait. But while we wait, Rails can’t process the new request until the method has finished and so both requests hang.

The only way to solve this is by running multiple Rails instances. You could do this on different ports, but that is quite cumbersome. Instead, we can use something like Passenger. Passenger creates a pool of Rails instances and will hand new requests to whichever is free.

Passenger is available as a gem, so simple add

gem 'passenger'

to your Gemfile. Then you can run it like this

passenger -p 3000 -a 127.0.0.1

(It will run an installation script the first time.)

Conclusion

That’s it! This will render your Javascript pages without having to rewrite them. It isn’t as fast, but it might save you a ton of work.

Sources: