-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Disclosed.ca is a portal for easy searching and visualizing Canadian government’s contracts. It uses the limited data available electronically on government websites. Users of disclosed.ca can search contracts by keyword, agency, and vendor. It’s also possible to see a breakdown of government spending by agency and view the contract information on the government website.
The project was begun in April 2008 by Nurey Networks, a web development consultancy firm. One of the goals was to test out an experimental new service and software development kit that Google had then announced: Google App Engine.
As of June 10 2009, the project is affiliated with VisibleGovernment — a non-profit organization promoting online tools for government transparency.
For the most part, each government agency follows guidelines which detail what information should be published and how that information should be presented. With a few exceptions, a Contract has a set number of properties:
- Vendor name
- Reference Number
- Contract date
- Description of work
- Contract period or delivery date
- Contract value
- Comments
One could manually download each contract using a browser but this would soon prove impractical due to the sheer number of contracts.
A computer program must be written to automate the process of downloading the contracts, extracting the properties that we are interested in and writing these properties to a file. This program is called the scraper. It is implemented in Perl.
The guidelines state that the contracts are to be published quarterly. This means the scraper must be run, at the minimum every quarter.
There’s a configuration file, agencies.yaml which defines how each agency’s website is to be scraped. See the technical documentation of Agency.pod for scraper implementation details.
The scraper produces a file for each agency in CSV format. This is a widespread format which can be opened in Microsoft Excel, Numbers, OpenOffice, etc. All the CSV files are collected into the “data” directory.
The goal of the web frontend is to break down the scraped contract data into meaningful information and present it on the web. To that end the front page combines:
- the search engine. This is a full text search engine implemented using Google App Engine’s SearchableModel.
- an example pie chart, which shows the top 10 agencies. Slice area indicates the total spending per agency in dollar amount.
- a tag cloud of agencies, which weighs the agencies by the total spending in dollar amount. A tag is the agency name. A bigger font size indicates more spending. Hover over a tag to reveal the exact dollar amount. Click on a tag to see contracts by that agency.
git clone git://github.com/nurey/disclosed.git
- download from Google
- cd disclosed/app2
- make app.yaml (this generates app.yaml)
- ./manage.py runserver (this starts the development web server)
- OR you can use GoogleAppEngineLauncher on OSX to launch the local web server
- right click: Add Existing
- for the path, choose the path to disclosed/app2
- OR you can use GoogleAppEngineLauncher on OSX to launch the local web server
Perl 5.10 is recommended
Install cpanminus to make installing modules easier (recommended by cpan.org)
sudo cpan App::cpanminus
Install carton to manage dependencies (Carton is to Perl what Bundler is to Ruby)
sudo cpanm Carton
Install the Perl libraries that the scraper depends on
cd $GOAT_HOME/scraper
carton install
export GOAT_HOME=/checkout/of/git/clone
carton exec perl scrape_agency.pl oag # this would scrape 'Office of the Auditor General of Canada'