-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Adding Sources
This document describes how to add sources to the ffanalytics package. First an introduction is given to the tables that contains the information necessary for the data scrapes to work, then a walkthrough is given to provide the steps to add sources to scrapes.
The following tables hold the information necessary for the data scrapes to work.
This table holds the information on sites that the data is coming from and has 4 columns
-
siteId
: this is an integer value that uniquely identifies the site. This is referenced in other tables. -
siteName
: This is the name of the site -
subscription
: this indicates with a 1 that a site requires a subscription and a 0 that it doesn't. -
playerId
: this indicates that there is a playerId retrieved from the source and what column in theplayerData
table it will be merged with.
This table holds information on analysts from sites that provides the data and has 7 columns:
-
analystId
: integer value that uniquely identifies the analysts. This value is referenced in other tables -
analystName
: the name of the analyst -
siteId
: thesiteId
from thesites
table indicating what site the analyst is associated with -
season
: inidicates with 1 if the analyst is providing season projections and 0 if the analysts doesn't provide season projections -
weekly
: indicates with 1 if the analyst is providing weekly projections and 0 if the analyst doesn't provide weekly projectons -
sourceId
: this column holds a site specific identifier for the analyst. -
weight
: this is the default weight used for the analyst
This table holds information on which position each analyst in analysts
provide data for and has 5 columns
-
analystPosId
: a sequential integer -
analystId
: the unique identifer for the analyst found in theanalysts
table. -
season
: indicates with a 1 if the analyst provides season projections for this position -
weekly
: indicates with a 1 if the analyst provides weekly projections for this position -
position
: name of the position. Should be one ofQB, RB, WR, TE, K, DST, DL, LB, DB
This table holds information on the urls that are used for the data scrapes. Provides location of the data. There are 7 columns in this table
-
siteId
: The integer value indicating what site the url belongs to -
siteUrl
: The URL for the site. Can also be a file path. Place holders are used in the url address:-
{$Season}
indicates where the season is placed -
{$WeekNo}
indicates where the week number is placed -
{$PgeID}
indicates where the page number is placed if the data is spread over multiple pages -
{$SrcID}
indicates where the analyst id is placed -
{$Pos}
indicates where the position id is placed.
-
-
urlPeriod
: Indicates whether the url is used for season or week projections -
urlType
: Indicates the type of url. Can behtml, xls, xml, csv or jqry
-
nameCol
: Indicates which number comlum has the player name -
urlTable
: Indicates which number table on the page that is being used. Is used forreadHTMLTable
. -
playerLink
: if it is possible to retrieve the playerId from a html page
This table holds information on the tables at the urls. This table has 9 columns
-
tableId
: An unique integer identifying the table -
positionAlias
: The value the site uses to identify positions -
siteId
: Integer value that identifies the site the table belongs to -
startPage
: If the data is spread over multiple pages this indicates the start page -
endPage
: If the data is spread over multiple pages this indicates the end page -
stepPage
: If the data is spread over multiple pages this indicates how the steps between pages are done -
season
: Indicates with a 1 if the table is used for season projections -
weekly
: Indicates with a 1 if the table is used for weekly projections -
position
: The position the table holds data for
This table holds information on what columns are in each siteTable
-
tableId
: Integer value that identifies what table the column belongs to -
columnName
: the name of the column. Should match acolumnName
indataColumns
-
columnType
: indicates what data type is in the column -
columnOrder
: the order of the column in the table -
columnPeriod
: Indicates if the column is used for season or week projections -
removeColumn
: Indicates with a 1 if the column can be removed before returning
This table holds information on what columns are in each siteTable
and has 6 columns.
-
tableId
: Integer value that identifies what table the column belongs to -
columnName
: the name of the column. Should match acolumnName
indataColumns
-
columnType
: indicates what data type is in the column -
columnOrder
: the order of the column in the table -
columnPeriod
: Indicates if the column is used for season or week projections -
removeColumn
: Indicates with a 1 if the column can be removed before returning
This table holds information on the different data columns used in the datatables and has 3 columns.
-
dataColId
: Integer value uniquely identifying the column -
columnName
: The name of the column -
columnType
: Data type of the column
Adding sources is done through modifying the above data tables by using for example
sites <- data.table::data.table(edit(sites))
The following order is suggested:
- Add the site in the
sites
table - Add the analyst information in the
analysts
table - Add the positions that the analyst is projecting for in the
analystPositions
table - Add the URL to the
siteUrls
table - Add the Table information to the
siteTables
table - Review and add columns to the
dataColumns
table if needed - Add information on the table colums to the
tableColumns
table.