Skip to content
Dennis Andersen edited this page Aug 6, 2016 · 2 revisions

Adding Sources to the ffanalytics package

This document describes how to add sources to the ffanalytics package. First an introduction is given to the tables that contains the information necessary for the data scrapes to work, then a walkthrough is given to provide the steps to add sources to scrapes.

Tables in the package

The following tables hold the information necessary for the data scrapes to work.

sites

This table holds the information on sites that the data is coming from and has 4 columns

  • siteId: this is an integer value that uniquely identifies the site. This is referenced in other tables.
  • siteName: This is the name of the site
  • subscription: this indicates with a 1 that a site requires a subscription and a 0 that it doesn't.
  • playerId: this indicates that there is a playerId retrieved from the source and what column in the playerData table it will be merged with.

analysts

This table holds information on analysts from sites that provides the data and has 7 columns:

  • analystId: integer value that uniquely identifies the analysts. This value is referenced in other tables
  • analystName: the name of the analyst
  • siteId: the siteId from the sites table indicating what site the analyst is associated with
  • season: inidicates with 1 if the analyst is providing season projections and 0 if the analysts doesn't provide season projections
  • weekly: indicates with 1 if the analyst is providing weekly projections and 0 if the analyst doesn't provide weekly projectons
  • sourceId: this column holds a site specific identifier for the analyst.
  • weight: this is the default weight used for the analyst

analystPositions

This table holds information on which position each analyst in analysts provide data for and has 5 columns

  • analystPosId: a sequential integer
  • analystId: the unique identifer for the analyst found in the analysts table.
  • season: indicates with a 1 if the analyst provides season projections for this position
  • weekly: indicates with a 1 if the analyst provides weekly projections for this position
  • position: name of the position. Should be one of QB, RB, WR, TE, K, DST, DL, LB, DB

siteUrls

This table holds information on the urls that are used for the data scrapes. Provides location of the data. There are 7 columns in this table

  • siteId: The integer value indicating what site the url belongs to
  • siteUrl: The URL for the site. Can also be a file path. Place holders are used in the url address:
    • {$Season} indicates where the season is placed
    • {$WeekNo} indicates where the week number is placed
    • {$PgeID} indicates where the page number is placed if the data is spread over multiple pages
    • {$SrcID} indicates where the analyst id is placed
    • {$Pos} indicates where the position id is placed.
  • urlPeriod: Indicates whether the url is used for season or week projections
  • urlType: Indicates the type of url. Can be html, xls, xml, csv or jqry
  • nameCol: Indicates which number comlum has the player name
  • urlTable: Indicates which number table on the page that is being used. Is used for readHTMLTable.
  • playerLink: if it is possible to retrieve the playerId from a html page

siteTables

This table holds information on the tables at the urls. This table has 9 columns

  • tableId: An unique integer identifying the table
  • positionAlias: The value the site uses to identify positions
  • siteId: Integer value that identifies the site the table belongs to
  • startPage: If the data is spread over multiple pages this indicates the start page
  • endPage: If the data is spread over multiple pages this indicates the end page
  • stepPage: If the data is spread over multiple pages this indicates how the steps between pages are done
  • season: Indicates with a 1 if the table is used for season projections
  • weekly: Indicates with a 1 if the table is used for weekly projections
  • position: The position the table holds data for

tableColumns

This table holds information on what columns are in each siteTable

  • tableId: Integer value that identifies what table the column belongs to
  • columnName: the name of the column. Should match a columnName in dataColumns
  • columnType: indicates what data type is in the column
  • columnOrder: the order of the column in the table
  • columnPeriod: Indicates if the column is used for season or week projections
  • removeColumn: Indicates with a 1 if the column can be removed before returning

tableColumns

This table holds information on what columns are in each siteTable and has 6 columns.

  • tableId: Integer value that identifies what table the column belongs to
  • columnName: the name of the column. Should match a columnName in dataColumns
  • columnType: indicates what data type is in the column
  • columnOrder: the order of the column in the table
  • columnPeriod: Indicates if the column is used for season or week projections
  • removeColumn: Indicates with a 1 if the column can be removed before returning

dataColumns

This table holds information on the different data columns used in the datatables and has 3 columns.

  • dataColId: Integer value uniquely identifying the column
  • columnName: The name of the column
  • columnType: Data type of the column

Adding sources

Adding sources is done through modifying the above data tables by using for example

sites <- data.table::data.table(edit(sites))

The following order is suggested:

  1. Add the site in the sites table
  2. Add the analyst information in the analysts table
  3. Add the positions that the analyst is projecting for in the analystPositions table
  4. Add the URL to the siteUrls table
  5. Add the Table information to the siteTables table
  6. Review and add columns to the dataColumns table if needed
  7. Add information on the table colums to the tableColumns table.