Skip to content

kardeepak77/ST558Prj1

Repository files navigation

ProjectNHL

Deepak Karawande June 13, 2021

Introduction

This vignette is an introduction on how to query REST-API end points using R and perform exploratory data analysis using various R packages. We’ll use be using the NHL REST-API endpoints which can be contracted by following instruction on https://gitlab.com/dword4/nhlapi/-/blob/master/records-api.md

Packages

Following list of packages were used for accessing REST-API endpoints and exploratory data analysis and presentation.

library("httr")
library("jsonlite")
library("tidyverse")
library("kableExtra")

Accessing NHL Data

Contract REST-API and create dataframe

NHLAPI project on github provides REST-API endpoints to access various datapoints for historical NHL games.

For this project, I accessed 7 different endpoints from NHLAPI to fetch information about NHL - 1. Franchise summary 2. Franchise details 3. Total stats for franchise 4. Season records 5. Skater records 6. Admin history and retired numbers 7. Team stats

GET function from httr package was used for fetching data through REST-API. Using content and from JSON function data received was converted into r dataframe object.

# Base url to access NHLAPI 
get_baseURL <- function() { return("https://records.nhl.com/site/api") }

# helper function to fetch data from REST-API endpoint and conert it to dataframe.
fetch_DF <- function(api_url) {
  get_response <- GET(api_url)
  json_contents <- content(get_response, "text")
  list_res <- fromJSON(json_contents, flatten = TRUE)
  return(list_res)
}

# Get All Franchise
get_all_franchise <- function() {
  tab_name <- "franchise"
  full_url <- paste0(get_baseURL(), "/", tab_name)
  franchise_res <- fetch_DF(full_url)
  franchise_res$data
}

Fetch data by Id or Name

By default NHL REST-APIs return data for all franchise or team. Using helper functions I created a ability for user to provide Id or Name to fetch data for specific franchise/team if desired. If no Id or Name is passed query functions will return data for all franchise or teams.

# Function returns id for franchise by matching team common name to the fran_name argument provided. If team name was not found it will return -1.
get_franchise_id <- function(fran_id=NA, fran_name=NA) {
  retValue <- NA
  
  if(!is.na(fran_id)){
    return(fran_id)
  } else if(!is.na(fran_name)) {
    #get id from name
    franchise <- get_all_franchise() %>% filter(teamCommonName == fran_name)
    if(nrow(franchise) == 1) {
      retValue <- franchise$id
    } else {
      #we couldn't find the teamid for team name specified.
      retValue <- -1
    }
  }
  return(retValue)
}

Function for fetching data of NHL records

With the help of functions above, created a R function to fetch data from desired NHL Records endpoint using either franchise id or team common name. If franchise id or team common name was not provided api will return data from all franchises. Similar set of functions were used to contract with NHL team stats RES-API end point.

# Get NHL records as data.frame for given team by id/name or all. If team not found then empty data.frame is returned.
get_NHL_records <- function(tab_name, fran_id=NA, fran_name=NA) {

  full_url <- ""
  id_filter <- ""
  id <- get_franchise_id(fran_id, fran_name)
  
  if(!is.na(id)) {
    if(id != -1) {
    # fetch franchise information for given team id.
      if(tab_name == "franchise" | tab_name == "franchise-detail") {
        id_filter <- paste0("cayenneExp=id=",id)
      } else {
        id_filter <- paste0("cayenneExp=franchiseId=",id)
      }
      full_url <- paste0(get_baseURL(), "/", tab_name, "?", id_filter)
    } else {
      # Couldn't fine franchise by the name provided.
      return(data.frame())
    }
  } else {
    #fetch franchise information for all teams
    full_url <- paste0(get_baseURL(), "/", tab_name)
  }
  
  #fetch NHL records
  records_res <- fetch_DF(full_url)
  return(records_res$data)
}

Wrapper function

Using Switch-Case, created wrapper function for providing simplicity for fetching data from all NHL REST-API endpoints relevant to this project,

NHL_wrapper_api <- function(command, fran_id=NA, fran_name=NA, team_id=NA, team_name=NA) {
  result = switch(  
      command,  
      "get_franchise"= get_NHL_records("franchise", fran_id, fran_name),  
      "get_total_stats"= get_NHL_records("franchise-team-totals", fran_id, fran_name),  
      "get_season_records"= get_NHL_records("franchise-season-records", fran_id, fran_name),  
      "get_goalie_records"= get_NHL_records("franchise-goalie-records", fran_id, fran_name),
      "get_skater_records"= get_NHL_records("franchise-skater-records", fran_id, fran_name),
      "get_franchise_detail"= get_NHL_records("franchise-detail", fran_id, fran_name),
      "get_stats_for_team"=get_NHL_stats(team_id, team_name),
      print0("command not found, available commands are = \n",
             "get_franchise",
             "get_total_stats",
             "get_season_records",
             "get_goalie_records",
             "get_skater_records",
             "get_franchise_detail",
             "get_team_stats_for_season",
             "get_stats_for_team"
             )
  )
  return(result)
}

Exploratory Data Analysis

Combine Franchise Summary and Details in single summary table.

To explore data, started with combining data from franchise summary and detail using inner_join to get basic tabular view of all franchises. Rendered such summary table using kable function from knitr.

# Fetch franchise and franchise detail 
franchise <- as.tbl(NHL_wrapper_api(command="get_franchise"))
franchise_details <- as.tbl(NHL_wrapper_api(command="get_franchise_detail"))

# Combine franchise and franchise detail using inner join.
franchise_joined <- inner_join(franchise, franchise_details, by="id" ) %>%
  select(id, heroImageUrl, teamCommonName, active ) 

# make image urls renderable for franchise.
franchise_joined$heroImageUrl[!is.na(franchise_joined$heroImageUrl)] <- sprintf("![](%s){width=100px}", franchise_joined$heroImageUrl)

#print 
franchise_joined %>% 
  knitr::kable(caption="Franchise Summary Preview") %>% 
  kable_styling()
Franchise Summary Preview
id heroImageUrl teamCommonName active
1 Canadiens TRUE
2 Wanderers FALSE
3 Eagles FALSE
4 Tigers FALSE
5 Maple Leafs TRUE
6 Bruins TRUE
7 Maroons FALSE
8 Americans FALSE
9 Quakers FALSE
10 Rangers TRUE
11 Blackhawks TRUE
12 Red Wings TRUE
13 Barons FALSE
14 Kings TRUE
15 Stars TRUE
16 Flyers TRUE
17 Penguins TRUE
18 Blues TRUE
19 Sabres TRUE
20 Canucks TRUE
21 Flames TRUE
22 Islanders TRUE
23 Devils TRUE
24 Capitals TRUE
25 Oilers TRUE
26 Hurricanes TRUE
27 Avalanche TRUE
28 Coyotes TRUE
29 Sharks TRUE
30 Senators TRUE
31 Lightning TRUE
32 Ducks TRUE
33 Panthers TRUE
34 Predators TRUE
35 Jets TRUE
36 Blue Jackets TRUE
37 Wild TRUE
38 Golden Knights TRUE
39 NA Kraken TRUE

Creating new variables

There are many ways to compute/add more variables to the dataset you are working with. I used group_by and summarize functions toc create two new variables totalWins and toalLosses for each unique combination of Franchise Id & Team. Also computed percentage wins for each such combination of Franchise Id & Team using total wins and total losses.

# calculate % of wins or % losses
Team_total_stats <- as.tbl(NHL_wrapper_api(command="get_total_stats"))

#colnames(Team_total_stats)
Team_total_stats %>% 
  select(teamName, franchiseId, wins, losses) %>% 
  group_by(franchiseId, teamName) %>%
  summarise(totalWins = sum(wins), totalLosses = sum(losses)) %>%
  mutate(perWins = round(totalWins/(totalWins+totalLosses),2)) %>%
  arrange(desc(perWins)) %>%
  head() %>%
  knitr::kable(caption="Win/Loss Percentages by Franchise & Teams") %>%
  kable_styling()
Win/Loss Percentages by Franchise & Teams
franchiseId teamName totalWins totalLosses perWins
38 Vegas Golden Knights 210 120 0.64
1 Montréal Canadiens 3917 2623 0.60
15 Dallas Stars 1189 833 0.59
16 Philadelphia Flyers 2310 1670 0.58
27 Colorado Avalanche 1131 822 0.58
6 Boston Bruins 3573 2740 0.57

Contingency tables

Contingency tables were created for Total goals scored by skaters by his position and franchise Id. 2 separate tables were created by considering active and inactive players.

# create contingency table for franchise, Wins, losses, %wins
skaters <- as.tbl(NHL_wrapper_api(command="get_skater_records"))

skaters$positionCode <- factor(skaters$positionCode)
skaters$franchiseName <- factor(skaters$franchiseName)

levels(skaters$positionCode) <- c("Center Forward", "Defenseman", "Left Wing Forward", "Right Wing Forward")

skaters %>%
  filter(activePlayer == TRUE) %>%
  group_by(franchiseName, positionCode) %>%
  summarise(TotalGoals=sum(goals)) %>%
  spread(positionCode, TotalGoals) %>%
  head() %>%
  knitr::kable(caption="Active Players: Goal Counts by FranchiseID and Position") %>%
  kable_styling()
Active Players: Goal Counts by FranchiseID and Position
franchiseName Center Forward Defenseman Left Wing Forward Right Wing Forward
Anaheim Ducks 677 237 308 797
Arizona Coyotes 294 351 200 272
Boston Bruins 1020 359 663 420
Buffalo Sabres 450 217 377 218
Calgary Flames 600 308 434 96
Carolina Hurricanes 970 244 471 110
skaters %>%
  filter(activePlayer == FALSE) %>%
  group_by(franchiseName,positionCode) %>%
  summarise(TotalGoals=sum(goals)) %>%
  spread(positionCode, TotalGoals) %>%
  head() %>%
  knitr::kable(caption="Inactive Players: Goal Counts by FranchiseID and Position") %>%
  kable_styling()
Inactive Players: Goal Counts by FranchiseID and Position
franchiseName Center Forward Defenseman Left Wing Forward Right Wing Forward
Anaheim Ducks 982 622 1084 915
Arizona Coyotes 2615 1203 2162 2563
Boston Bruins 5667 2658 4986 5271
Brooklyn Americans 519 202 519 403
Buffalo Sabres 3516 1346 2852 3413
Calgary Flames 3436 1497 2295 3596

Numerical summeries

Numerical summaries were created for toals, gamesPlayed, mostGoalsOneGame, mostGoalsOneSeason by skaters with different positions as follows -

# Numerical Summaries

sakters_table <- function(pos){
  data <- skaters %>% filter(positionCode == pos) %>% select(goals, gamesPlayed, mostGoalsOneGame, mostGoalsOneSeason)
  kable(apply(data, 2, summary), caption = paste("Summary Goals by Position", pos), digit = 1) %>%
  kable_styling()
}

sakters_table("Center Forward")
Summary Goals by Position Center Forward
goals gamesPlayed mostGoalsOneGame mostGoalsOneSeason
Min. 0.0 1.0 0.0 0.0
1st Qu. 1.0 15.0 1.0 1.0
Median 6.0 55.0 1.0 5.0
Mean 27.6 121.2 1.4 9.4
3rd Qu. 26.0 148.0 2.0 14.0
Max. 692.0 1607.0 7.0 92.0
sakters_table("Defenseman")
Summary Goals by Position Defenseman
goals gamesPlayed mostGoalsOneGame mostGoalsOneSeason
Min. 0.0 1 0.0 0.0
1st Qu. 0.0 15 0.0 0.0
Median 2.0 56 1.0 2.0
Mean 8.7 117 0.9 3.4
3rd Qu. 8.0 150 1.0 5.0
Max. 395.0 1564 5.0 48.0
sakters_table("Left Wing Forward")
Summary Goals by Position Left Wing Forward
goals gamesPlayed mostGoalsOneGame mostGoalsOneSeason
Min. 0.0 1.0 0.0 0.0
1st Qu. 1.0 14.0 1.0 1.0
Median 5.0 52.0 1.0 4.0
Mean 24.1 110.5 1.4 8.8
3rd Qu. 24.0 141.0 2.0 14.0
Max. 730.0 1436.0 6.0 65.0
sakters_table("Right Wing Forward")
Summary Goals by Position Right Wing Forward
goals gamesPlayed mostGoalsOneGame mostGoalsOneSeason
Min. 0.0 1.0 0.0 0.0
1st Qu. 1.0 15.0 1.0 1.0
Median 6.0 53.0 1.0 5.0
Mean 28.2 117.9 1.5 9.9
3rd Qu. 27.0 146.5 2.0 15.0
Max. 786.0 1687.0 5.0 86.0

Plots

ggplot packages supports creating nice plots to describe data.

Bar plots

For created Bar plot of totals goals by skater’s position used geom_bar function from ggplot with stat=identity to used y value provided for bar height.

skatersData <- skaters %>%
  group_by(positionCode) %>%
  summarise(TotalGoals=sum(goals), TotalGames=sum(gamesPlayed))

ggplot(skatersData, aes(x = positionCode, y=TotalGoals )) + 
  geom_bar(stat="identity") + 
  ggtitle("Bar Plot: Total Goals by PositionCode of Skaters")

Histogram plots

Density plot is created using geom_histogram for mostSaves in one game by a goalie.

# fetch goalie data
goalie <- as.tbl(NHL_wrapper_api(command="get_goalie_records"))

#keep only mostSavesOneGame column in it
goalie_msg <- select(goalie, mostSavesOneGame, activePlayer)

goalie_msg$activePlayer <- factor(goalie_msg$activePlayer)
levels(goalie_msg$activePlayer) <- c("Inactive", "Active")
#remove NA rows
goalie_msg <- na.omit(goalie_msg)

ggplot(goalie_msg, aes(x = mostSavesOneGame, ..density..)) + 
  geom_histogram(bins = 20) + 
  ggtitle("Histogram for Most Save by a Goalie in one game") + 
  ylab("Density") + 
  geom_density(col = "red", lwd = 3, adjust = 0.4)

Using facet_wrap layer density plot of active and inactive player for mostSaves in one game by a goalies created as follows -

ggplot(goalie_msg, aes(x = mostSavesOneGame, ..density..)) + 
  geom_histogram(bins = 20) + 
  facet_wrap(~activePlayer) + 
  ggtitle("2 Histogram for Most Save by a Active vs InActive Goalie in one game") + 
  ylab("Density") + 
  geom_density(col = "red", lwd = 3, adjust = 0.4)

Box plot

Box plot of points for active and inactive franchise is created with geom_boxplot layer.

Scatter plot

geom_point layer function allows creating scatter plot with ggplot. Her is active and inactive franchise wins and fit linear model line in it.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages