Skip to content

A google Cloud Run service to periodically snapshot the Colorado Secretary of State's voter file into your BigQuery data warehouse

Notifications You must be signed in to change notification settings

bengen343/co-voters-to-gbq

Repository files navigation

Project logo

Colorado Voter File Snapshot


A Google Cloud Run service to periodically download the Colorado voter file and snapshot the change history to Google BigQuery.

Table of Contents

About

The Colorado Secreatry of State's office makes a monthly updated list of all registered voters in the state available.

This repository creates a Google Cloud Run service that can be run on a schedule or on demand to retrieve the Secretary of State data, upload it your Google BigQuery instance, and record the changes made to the records for all individual voters over time.

Getting Started

These instructions will get your copy of the project up and running on your local machine for development and testing purposes.

This repository depends on certain user defined variables that need to be configured in two places: config.py & .env

config.py will generally contain the latest values that I am using to run this repository. A careful read and update to your own values should be all that's needed to update this file. The variables contained here are mostly file and project names.

This repository includes a template for .env as the file .env.template. If you are attempting to run this repository locally and not as a Google Cloud Run service you will need to modify .env.template to contain your secret variables such as the strings for your Secretary of State ftp user name and password as well as your Google Service Account credentials. Once you have done so, save .env.template as .env

Prerequisites

This repository is meant to be run as a Google Cloud Run service. That means you will need your own Google Cloud login and several Google Cloud services enabled.

  1. Google Cloud Run
  2. Google Big Query
  3. Google Cloud Scheduler

This repository uploads this information to our Google BigQuery. If you do not have your own BigQuery instance it won't work.

Last, but not least, you need to have your own authorization and credentials to connect to the Colorado Secreatary of State's FTP service to download voter data.

Unfortunately, this means this repository will only serve demonstrative purposes for most users.

Installing

  1. Clone this repository to your local machine.
git clone https://github.com/bengen343/co-voters-to-gbq.git
  1. Update variables within config.py to name your outputs and correctly reference file origins/destinations.

  2. Update .env.template with values for each of the variables requested. Note that the variable BQ_ACCOUNT_CREDS should be the entire output of the json credential file associated with the Google service account you're using for this project. Save this file as .env when you've added all your values.

  3. Create a virtual environment for the repository, probably by running python3 -m venv .venv in the terminal from the local directory.

  4. Activate the virtual environment, probably by running .venv\Scripts\activate.bat on Windows or source .venv/bin/activate on Mac.

  5. Install the required dependencies by running pip install -r requirements.txt

Deployment <a name = "deployment>

This repository is meant to be run as a Google Cloud Run Service. Later it will be converted to a Google Cloud Run Job. At present, the Colorado Secretary of State FTP server only allows connections from whitelisted IP addresses. As such, the Google Cloud Run service needs to carry a dedicated IP address as outlined in this Goolge guide: https://cloud.google.com/run/docs/configuring/static-outbound-ip

Built Using

Authors

About

A google Cloud Run service to periodically snapshot the Colorado Secretary of State's voter file into your BigQuery data warehouse

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published