Skip to content
Tim Erickson edited this page Feb 12, 2019 · 4 revisions

BARTY

This plugin is known as barty (for BART-year, as opposed to our first BART plugin, which has data for only one day).

BARTy gives users access to (at this writing) one year of BART data: for each hour, and each pair of stations, you can learn the number of people who entered the system at the first station and exited at the second. This works out to over 10 million cases.

The designer's task, therefore, is to give users easy-to-understand access to this large data set, and to give them useful choices about what cases to download, when they can only practically work with a few thousand cases at a time.

Why barty is so cool for DSE

The data are more multidimensional than they sound, so users have to think hard about what data to ask for; and then CODAP gives them the chance to fix their mistakes easily and improve what they are doing. Also, because of that multidimensionality, there are many ways to organize data and compute aggregate measures. So there is a lot of flexibility in what data moves are possible or useful.

For example, suppose you have the task of figuring out how many people took BART to attend Pride, which is on a Sunday in June. You need to conider questions such as

  • What destination stations should you look at?
  • Do "source" stations matter?
  • What times should you look at?
  • So you look at arrivals or departures or both?
  • Should you compare the data with a non-pride day? Which day? Only one?

Of course, Tim prefers that we not give students these questions but let them come up with them themselves in a process of looking critically at partial solutions.

The files

  • barty.html: The overall UI, etc.
  • barty.js: this contains the central initialization method, and defines the global barty.
  • barty.constants.js: contains barty.constants. Here is where whence and the php paths are defined.
  • barty.css: styling the html
  • barty.ui.js: vast file that adjusts visibility, contents of controls, formats of strings, etc., and responds to user changes to their selection criteria. Inlater plugins, some of this would be in a userActions file.
  • bartyManager.js: the main controller, defines barty.manager. Importantly, assembles the POST parameters its method doBucketOfData sends to PHP. NOTE: this whole mechanism ought to be updated to use the Fetch interface; we would use that and make a barty.phpConnect file.
  • bartyCODAPConnector.js: establishes communication with CODAP, and outputs any data items received from the DB.
  • bartyMeetings.js: regulates the meetings parameters and count adjustments if there is a secret meeting.
  • bartStations.js: defines the JSON object barty.stations that makes the stations table in the mySQL obsolete.
  • php/getBARTYdata.php: receives a POST from `barty.manager', then uses PDO (thankfully) to extract the specified data from the DB.
  • php/establishCredentials.php: refers to the publicly-inaccessible credentials file that contains mySQL passwords. requires that whence be set properly.
  • sql/barty entire 2015.sql: A huge (533 MB) SQL file containing all of the data for 2015.

SQL table

  • hours : The large, 40M record file, one for each hour between each pair of stations

Setting up the SQL database

The database is huge, so needs special care. How huge? For every hour, we have the number of riders between every pair of two stations. There are about 50 stations, so there are 2500 pairs. Multiply by 20 hours in a day and you get 50,000 records per day. Times 400 days and you get 20,000,000 records per year. Because many of these cells are blank (and we overestimated) the actual number is more like 10,000,000 records. As a CSV, this amounts to about 260 MB per year; 40 MB zipped. So four years is 160 MB zipped, 1 GB unzipped.

You will need files stored in a folder on Google Drive.

In February 2019, it contains:

  • A zip with four csv files, one for each year from 2015 to 2018.
  • a .sql file that, when run, deletes the existing hours table and recreates it, empty (but with the correct structure).

Here is that .sql file:

DROP TABLE IF EXISTS `hours`;

CREATE TABLE `hours` (
  `date` date DEFAULT NULL,
  `hour` int(11) unsigned DEFAULT NULL,
  `origin` varchar(4) DEFAULT NULL,
  `destination` varchar(4) DEFAULT NULL,
  `riders` int(11) unsigned DEFAULT NULL,
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`),
  KEY `date` (`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

To orient you (and to make it so you don't have to open one of these files), the .csv for 2018 starts like this:

2018-01-01,0,12TH,12TH,3
2018-01-01,0,12TH,16TH,1
2018-01-01,0,12TH,BAYF,1
2018-01-01,0,12TH,CAST,3
2018-01-01,0,12TH,CIVC,2

Notice that there are no column headers in the first row. The table set-up in SQL is such that the variables are in the right order (although the auto-increment index id is not to be imported).

The order, then, for each line, is: date, hour, origin, destination, riders. Looking at the last line, that means that between midnight and 12:59 AM on new year's morning, two people got off at Civic Center, having gotten on at 12th Street Oakland.

What do you actually do?

  • Download and run barty-table-defs.sql.
  • Download and unzip barty-hourly-2015to2018.zip. You will get four .csv files.
  • IMPORT each .csv into the now-extant hours table; if you use Sequel Pro, you will have a chance to make sure the fields in the csv correspond correctly to the ones in the SQL.

In future years, you should be able to download a new annual file from BART. Then (in January 2022, say) you would ned only download the new file and IMPORT it.

Clone this wiki locally