Architecture of 2.0

Version 2.0 of the CCJ project is structured differently than 1.0. The architecture is divided into 5 different components, the scraper, the database, some middleware, the raw inmate data, and the RESTful API.

The raw inmate data

The raw inmate data is a list of directories located at the inmate data route. A directory is created for every year that we have data for. For example all the data collected during 2014 is stored in a directory called 2014. Inside this directory is a list of files, each corresponding to a day in MM-DD.csv format. In each of these files is the inmate data collected for that day. You can read more about how the csv files are structured in their specification.

The Scraper

The scraper is ran everyday. During it's run it will create a new file in the raw inmate data. If it ran today June 6, 2014, it would create a new file named 06-06.csv in the 2014 directory. In there it will store all the records it collects from the Sheriff's website. For every inmate it would add a row to the file with the data.

The API

The API looks at the files located in the raw inmate data. From these files it populates it's database. With the database now populated you can access it using the different entry points.

Arch image

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture of 2.0

The raw inmate data

The Scraper

The API

Clone this wiki locally