Seo Server is a command line tool that runs a server that allows GoogleBot (and any other crawlers) to crawl your heavily Javascript built websites. The tool works with very little changes to your server or client side code.
- Install CoffeeScript (if not already)
npm install -g coffee-script
- Edit configuration file
src/config.coffee.sample
and save it assrc/config.coffee
- Compile the config into project directory
coffee --output lib/ -c src/config.coffee
- Install npm dependencies
npm install
- Install PhantomJS
npm install -g phantomjs
- Start the main process on port 10300 and with default memcached conf:
bin/seoserver start -p 10300
The crawler has three parts:
lib/phantom-server.js A small PhantomJS script for fetching the page and returning the response along with the response headers in serialized form. It can be executed via:
phantomjs lib/phantom-server.js http://moviepilot.com/stories
lib/seoserver.js A node express app responsible for accepting the requests from Googlebot, checking if there is a cached version on memcached, otherwise fetching the page via phantom-server.js
.
You can start it locally with:
node lib/seoserver.js start
And test its output with:
curl -v http://localhost:10300
bin/seoserver Forever-monitor script, for launching and monitoring the node main process.
bin/seoserver start -p 10300
Your webserver has to detect incoming search engine requests in order to route them to the seoserver. A way of doing so is looking for the string "bot" in the User-Agent-Header, or by checking for Google's escaped fragment. In Nginx you can check the variable $http_user_agent and set the backend similar to this:
location / {
proxy_pass http://defaultbackend;
if ($http_user_agent ~* bot) {
proxy_pass http://seoserver;
}
location ~* escaped_fragment {
proxy_pass http://seoserver;
}
If you deliver a cached version of your website with a reverse proxy in front, you can do a similar check. A vcl example for Varnish:
if (req.http.User-Agent ~ "bot" || req.url ~ "escaped_fragment") {
set req.http.UA-Type = "crawler";
} else {
set req.http.UA-Type = "regular";
}
This code is based on a tutorial by Thomas Davis and on https://github.com/apiengine/seoserver