Home

Welcome to the disclosed.ca wiki!

Introduction

Disclosed.ca is a portal for easy searching and visualizing Canadian government’s contracts. It uses the limited data available electronically on government websites. Users of disclosed.ca can search contracts by keyword, agency, and vendor. It’s also possible to see a breakdown of government spending by agency and view the contract information on the government website.

The project was begun in April 2008 by Nurey Networks, a web development consultancy firm. One of the goals was to test out an experimental new service and software development kit that Google had then announced: Google App Engine.

As of June 10 2009, the project is affiliated with VisibleGovernment — a non-profit organization promoting online tools for government transparency.

Technical Specifications

Data acquisition

For the most part, each government agency follows guidelines which detail what information should be published and how that information should be presented. With a few exceptions, a Contract has a set number of properties:

Vendor name
Reference Number
Contract date
Description of work
Contract period or delivery date
Contract value
Comments

One could manually download each contract using a browser but this would soon prove impractical due to the sheer number of contracts.
A computer program must be written to automate the process of downloading the contracts, extracting the properties that we are interested in and writing these properties to a file. This program is called the scraper. It is implemented in Perl.

The guidelines state that the contracts are to be published quarterly. This means the scraper must be run, at the minimum every quarter.
There’s a configuration file, agencies.yaml which defines how each agency’s website is to be scraped. See the technical documentation of Agency.pod for scraper implementation details.

The scraper produces a file for each agency in CSV format. This is a widespread format which can be opened in Microsoft Excel, Numbers, OpenOffice, etc. All the CSV files are collected into the “data” directory.

Web Frontend

The goal of the web frontend is to break down the scraped contract data into meaningful information and present it on the web. To that end the front page combines:

the search engine. This is a full text search engine implemented using Google App Engine’s SearchableModel.
an example pie chart, which shows the top 10 agencies. Slice area indicates the total spending per agency in dollar amount.
a tag cloud of agencies, which weighs the agencies by the total spending in dollar amount. A tag is the agency name. A bigger font size indicates more spending. Hover over a tag to reveal the exact dollar amount. Click on a tag to see contracts by that agency.

Setting up a development environment

Install git and checkout the code

git clone git://github.com/nurey/disclosed.git

Google App Engine SDK

download from Google
cd disclosed/app2
make app.yaml (this generates app.yaml)
./manage.py runserver (this starts the development web server)
1. OR you can use GoogleAppEngineLauncher on OSX to launch the local web server
  1. right click: Add Existing
  2. for the path, choose the path to disclosed/app2

Perl and Dependencies

Perl 5.10 is recommended

Install cpanminus to make installing modules easier (recommended by cpan.org)

sudo cpan App::cpanminus

Install carton to manage dependencies (Carton is to Perl what Bundler is to Ruby)

sudo cpanm Carton

Install the Perl libraries that the scraper depends on

cd $GOAT_HOME/scraper
carton install

Scraping

export GOAT_HOME=/checkout/of/git/clone
carton exec perl scrape_agency.pl oag # this would scrape 'Office of the Auditor General of Canada'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly