1 + 1
[1] 2
-diff --git a/.DS_Store b/.DS_Store index 51c5e16..c3daac9 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/_site/background.html b/_site/background.html index e2f0136..b828d00 100644 --- a/_site/background.html +++ b/_site/background.html @@ -59,6 +59,8 @@ } } + + @@ -83,6 +85,10 @@
-On average, police in the United States shoot and kill more than 1,000 people every year, according to an ongoing analysis by The Washington Post.
We propose a case study to explore the relationship between police residence and fatal police shootings, employing advanced data science methodologies. Focusing on officers residing in the cities they serve, our project aims to uncover insights and patterns that contribute to a nuanced understanding of this complex issue.
+Methodology:
+Hypothesis and Expected Outcomes:
+We will conduct two hypothesis tests to analyze both;
+the nominal relationship between an increasing proportion of in-city officer residency and number of fatal police shooting deaths and
the categorical difference in fatal police shooting deaths between cities where a majority or or minority of police officers live in the city.
Inference for a Difference in Proportions
+\(H_0\): The mean total number of fatal shootings per agencies does not differ based on if a majority of the officers live in the city or not.
\(H_A\): The mean total number of fatal shootings per agencies is fewer in cities where a majority of the officers live in the city then cities where they do not.
+Inference for a Correlation
+\(H_O\): There is no relationship between percentage of the total police force that lives in the city they serve and number of fatal shootings.
\(H_A\): There is a relationship between percentage of the total police force that lives in the city they serve and number of fatal shootings.
+\(H_0 : \rho = 0\)
\(H_0 : \rho \neq 0\)
In 2015, The Washington Post began tracking details about each police-involved killing in the United States — the race of the deceased, the circumstances of the shooting, whether the person was armed and whether the person was experiencing a mental-health crisis — by manually culling local news reports, collecting information from law enforcement websites and social media, and monitoring independent databases such as Fatal Encounters and the now-defunct Killed by Police project. In many cases, The Post conducts additional reporting.
In 2022, The Post updated its database to standardize and publish the names of the police agencies involved in each shooting to better measure accountability at the department level.
The 2014 killing of Michael Brown in Ferguson, Mo. began a protest movement culminating in the Black Lives Matter movement and an increased focus on police accountability nationwide. In this data set, The Post tracks only shootings with circumstances closely paralleling those like the killing of Brown — incidents in which a police officer, in the line of duty, shoots and kills a civilian. The Post is not tracking deaths of people in police custody, fatal shootings by off-duty officers or non-shooting deaths in this data set.
diff --git a/_site/about.html b/_site/codebook.html similarity index 73% rename from _site/about.html rename to _site/codebook.html index c1b9615..2897ab9 100644 --- a/_site/about.html +++ b/_site/codebook.html @@ -7,7 +7,7 @@ -About this site
-1 + 1
[1] 2
-Name | +Description | +
---|---|
city |
+U.S. city | +
police_force_size |
+Number of police officers serving that city | +
all |
+Percentage of the total police force that lives in the city | +
white |
+Percentage of white (non-Hispanic) police officers who live in the city | +
non-white |
+Percentage of non-white police officers who live in the city | +
black |
+Percentage of black police officers who live in the city | +
hispanic |
+Percentage of Hispanic police officers who live in the city | +
asian |
+Percentage of Asian police officers who live in the city | +
Name | +Description | +
---|---|
id |
+A unique identifier for each fatal police shooting incident. | +
date |
+The date of the fatal shooting. | +
body_camera |
+Whether news reports have indicated an officer was wearing a body camera and it may have recorded some portion of the incident. | +
city |
+The municipality where the fatal shooting took place | +
county |
+County where the fatal shooting took place. | +
state |
+The two-letter postal code abbreviation for the state in which the fatal shooting took place. | +
latitude |
+The latitude location of the shooting expressed as WGS84 coordinates, geocoded from addresses. Please note that the precision and accuracy of incident coordinates varies depending on the precision of the input address which is often only available at the block level. | +
longitude |
+The longitude location of the shooting expressed as WGS84 coordinates, geocoded from addresses. | +
+ | Description | +
---|---|
id |
+Department Database Id | +
name |
+Department Name | +
state |
+State in which the agency is located. | +
I am interested in exploring data related to…
+library(tidyverse)
Warning: package 'lubridate' was built under R version 4.3.1
-── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
-✔ dplyr 1.1.3 ✔ readr 2.1.4
-✔ forcats 1.0.0 ✔ stringr 1.5.0
-✔ ggplot2 3.4.2 ✔ tibble 3.2.1
-✔ lubridate 1.9.3 ✔ tidyr 1.3.0
-✔ purrr 1.0.2
-── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
-✖ dplyr::filter() masks stats::filter()
-✖ dplyr::lag() masks stats::lag()
-ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
-library(usmap)
-library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
-library(infer)
-library(moderndive)
library(tidyverse)
+library(usmap)
+library(sf)
+library(infer)
+library(moderndive)
##Tidying Data
-
-#creating dfs from .csv files
-<- read_csv("data/police-locals.csv") police_locals
Rows: 75 Columns: 10
-── Column specification ────────────────────────────────────────────────────────
-Delimiter: ","
-chr (6): city_old, city, state, black, hispanic, asian
-dbl (4): police_force_size, all, white, non-white
-
-ℹ Use `spec()` to retrieve the full column specification for this data.
-ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
-<- read_csv("data/fatal-police-shootings-agencies.csv") agencies
Rows: 3422 Columns: 6
-── Column specification ────────────────────────────────────────────────────────
-Delimiter: ","
-chr (4): name, type, state, oricodes
-dbl (2): id, total_shootings
-
-ℹ Use `spec()` to retrieve the full column specification for this data.
-ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
-<- read_csv("data/fatal-police-shootings-data.csv") shootings
Rows: 9129 Columns: 19
-── Column specification ────────────────────────────────────────────────────────
-Delimiter: ","
-chr (12): threat_type, flee_status, armed_with, city, county, state, locati...
-dbl (4): id, latitude, longitude, age
-lgl (2): was_mental_illness_related, body_camera
-date (1): date
-
-ℹ Use `spec()` to retrieve the full column specification for this data.
-ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
-#removing old `city` tag from data set that we created when decatenated the city names
-<- police_locals |>
- police_locals select(-city_old)
-
-# creating `agencies` df with just police departments
-<- agencies |>
- agencies filter(grepl("department", tolower(name))) |>
- filter(!grepl("county", tolower(name)))
-
-#creating binned categorical account of if shooting victim was `armed`
-<- shootings |>
- shootings mutate(armed = case_when(is.na(armed_with) ~ "NO",
- == "unarmed" ~ "NO",
- armed_with == "unknown" ~ "NO",
- armed_with == "undetermined" ~ "NO",
- armed_with == "gun" ~ "YES",
- armed_with == "knife" ~ "YES",
- armed_with == "blunt_object" ~ "YES",
- armed_with == "other" ~ "YES",
- armed_with == "replica" ~ "YES",
- armed_with == "vehicle" ~ "YES"))
- armed_with
-#creating df with only agency `names`, `id`, and `state`
-<- agencies |>
- agencies_ids select(name, id, state)
- agencies_ids
# A tibble: 2,057 × 3
- name id state
- <chr> <dbl> <chr>
- 1 Aberdeen Police Department 2576 WA
- 2 Abilene Police Department 2114 TX
- 3 Abington Township Police Department 2088 PA
- 4 Acworth Police Department 3375 GA
- 5 Ada Police Department 2579 OK
- 6 Adel Police Department 3107 GA
- 7 Akron Police Department 815 OH
- 8 Alamogordo Police Department 1434 NM
- 9 Alamosa Police Department 2354 CO
-10 Albany Police Department 1443 GA
-# ℹ 2,047 more rows
-#creating df with `city`, `agency`, and `state` info for each shooting
-<- shootings |>
- shooting_agencies select(city, agency_ids, state)
- shooting_agencies
# A tibble: 9,129 × 3
- city agency_ids state
- <chr> <chr> <chr>
- 1 Shelton 73 WA
- 2 Aloha 70 OR
- 3 Wichita 238 KS
- 4 San Francisco 196 CA
- 5 Evans 473 CO
- 6 Guthrie 101 OK
- 7 Chandler 195 AZ
- 8 Assaria 490 KS
- 9 Burlington 287 IA
-10 Knoxville 26254 PA
-# ℹ 9,119 more rows
-#changing `shooting` var in `shooting_agencies` df to numeric
-$agency_ids <- as.numeric(shootings$agency_ids) shooting_agencies
Warning: NAs introduced by coercion
-#creating df with `city` and `state` info for each agency by joining `agencies_ids` and `shooting_agencies`
-<- agencies_ids |>
- agencies_w_cities left_join(shooting_agencies, by = c("id" = "agency_ids", "state" = "state")) |>
- drop_na(city) |>
- distinct(id, .keep_all = TRUE)
- agencies_w_cities
# A tibble: 1,781 × 4
- name id state city
- <chr> <dbl> <chr> <chr>
- 1 Aberdeen Police Department 2576 WA Aberdeen
- 2 Abilene Police Department 2114 TX Abilene
- 3 Abington Township Police Department 2088 PA Abington Township
- 4 Acworth Police Department 3375 GA Acworth
- 5 Ada Police Department 2579 OK Ada
- 6 Adel Police Department 3107 GA Adel
- 7 Akron Police Department 815 OH Akron
- 8 Alamogordo Police Department 1434 NM Alamogordo
- 9 Alamosa Police Department 2354 CO Alamosa
-10 Albany Police Department 1443 GA Albany
-# ℹ 1,771 more rows
-#creating df with census data for each agency by joining `agencies_w_cities` and `police_locals`
-<- agencies_w_cities |>
- agencies_census full_join(police_locals, by = c("city" = "city", "state" = "state")) |>
- drop_na(police_force_size) |>
- distinct(id, .keep_all = TRUE) |>
- mutate(majority = if_else(all >= 0.5, "TRUE", "FALSE"))
- agencies_census
# A tibble: 109 × 12
- name id state city police_force_size all white `non-white` black
- <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
- 1 Albany P… 2237 NY Alba… 890 0.185 0.160 0.364 **
- 2 Albuquer… 508 NM Albu… 1340 0.616 0.630 0.602 **
- 3 Amtrak P… 1657 IL Chic… 12120 0.875 0.872 0.877 0.89…
- 4 Atlanta … 447 GA Atla… 2950 0.137 0.186 0.111 0.10…
- 5 Austin P… 141 TX Aust… 1985 0.295 0.195 0.427 0.25
- 6 Baltimor… 4784 MD Balt… 2800 0.257 0.133 0.362 0.39…
- 7 Baltimor… 149 MD Balt… 2800 0.257 0.133 0.362 0.39…
- 8 BART Pol… 2015 CA Oakl… 1530 0.0948 0.0267 0.160 0.06…
- 9 Baton Ro… 1098 LA Bato… 980 0.214 0.144 0.321 0.34…
-10 Boston P… 3 MA Bost… 2560 0.477 0.442 0.583 0.68…
-# ℹ 99 more rows
-# ℹ 3 more variables: hispanic <chr>, asian <chr>, majority <chr>
-#creating df of only shootings involving agencies within `agencies` df
-<- shootings |>
- shootings_case right_join(agencies_census, by = c("city" = "city", "state" = "state")) |>
- select(-agency_ids) |>
- rename(agency_ids = id.y, id = id.x, agency = name.y, victim = name.x) |>
- select(-location_precision, -race_source)
Warning in right_join(shootings, agencies_census, by = c(city = "city", : Detected an unexpected many-to-many relationship between `x` and `y`.
-ℹ Row 4 of `x` matches multiple rows in `y`.
-ℹ Row 29 of `y` matches multiple rows in `x`.
-ℹ If a many-to-many relationship is expected, set `relationship =
- "many-to-many"` to silence this warning.
- shootings_case
# A tibble: 3,677 × 27
- id date threat_type flee_status armed_with city county state
- <dbl> <date> <chr> <chr> <chr> <chr> <chr> <chr>
- 1 5 2015-01-03 move not unarmed Wichita Sedgw… KS
- 2 8 2015-01-04 point not replica San Francis… San F… CA
- 3 8 2015-01-04 point not replica San Francis… San F… CA
- 4 22 2015-01-07 threat not knife Columbus Frank… OH
- 5 22 2015-01-07 threat not knife Columbus Frank… OH
- 6 27 2015-01-07 shoot foot gun New Orleans Orlea… LA
- 7 325 2015-01-09 point not gun El Paso El Pa… TX
- 8 46 2015-01-13 shoot foot gun Albuquerque Berna… NM
- 9 46 2015-01-13 shoot foot gun Albuquerque Berna… NM
-10 56 2015-01-15 shoot foot gun Indianapolis Marion IN
-# ℹ 3,667 more rows
-# ℹ 19 more variables: latitude <dbl>, longitude <dbl>, victim <chr>,
-# age <dbl>, gender <chr>, race <chr>, was_mental_illness_related <lgl>,
-# body_camera <lgl>, armed <chr>, agency <chr>, agency_ids <dbl>,
-# police_force_size <dbl>, all <dbl>, white <dbl>, `non-white` <dbl>,
-# black <chr>, hispanic <chr>, asian <chr>, majority <chr>
-##Tidying Data
+
+#creating dfs from .csv files
+<- read_csv("data/police-locals.csv")
+ police_locals <- read_csv("data/fatal-police-shootings-agencies.csv")
+ agencies <- read_csv("data/fatal-police-shootings-data.csv")
+ shootings
+#removing old `city` tag from data set that we created when decatenated the city names
+<- police_locals |>
+ police_locals select(-city_old)
+
+# creating `agencies` df with just police departments
+<- agencies |>
+ agencies filter(grepl("department", tolower(name))) |>
+ filter(!grepl("county", tolower(name)))
+
+#creating binned categorical account of if shooting victim was `armed`
+<- shootings |>
+ shootings mutate(armed = case_when(is.na(armed_with) ~ "NO",
+ == "unarmed" ~ "NO",
+ armed_with == "unknown" ~ "NO",
+ armed_with == "undetermined" ~ "NO",
+ armed_with == "gun" ~ "YES",
+ armed_with == "knife" ~ "YES",
+ armed_with == "blunt_object" ~ "YES",
+ armed_with == "other" ~ "YES",
+ armed_with == "replica" ~ "YES",
+ armed_with == "vehicle" ~ "YES"))
+ armed_with
+#creating df with only agency `names`, `id`, and `state`
+<- agencies |>
+ agencies_ids select(name, id, state)
+
+#creating df with `city`, `agency`, and `state` info for each shooting
+<- shootings |>
+ shooting_agencies select(city, agency_ids, state)
+
+#changing `shooting` var in `shooting_agencies` df to numeric
+$agency_ids <- as.numeric(shootings$agency_ids)
+ shooting_agencies
+#creating df with `city` and `state` info for each agency by joining `agencies_ids` and `shooting_agencies`
+<- agencies_ids |>
+ agencies_w_cities left_join(shooting_agencies, by = c("id" = "agency_ids", "state" = "state")) |>
+ drop_na(city) |>
+ distinct(id, .keep_all = TRUE)
+
+#creating df with census data for each agency by joining `agencies_w_cities` and `police_locals`
+<- agencies_w_cities |>
+ agencies_census full_join(police_locals, by = c("city" = "city", "state" = "state")) |>
+ drop_na(police_force_size) |>
+ distinct(id, .keep_all = TRUE) |>
+ mutate(majority = if_else(all >= 0.5, "TRUE", "FALSE"))
+
+#creating df of only shootings involving agencies within `agencies` df
+<- shootings |>
+ shootings_case right_join(agencies_census, by = c("city" = "city", "state" = "state")) |>
+ select(-agency_ids) |>
+ rename(agency_ids = id.y, id = id.x, agency = name.y, victim = name.x) |>
+ select(-location_precision, -race_source)
<- shootings_case |>
- shootings_by_agency count(agency)
- shootings_by_agency
# A tibble: 108 × 2
- agency n
- <chr> <int>
- 1 Albany Police Department 1
- 2 Albuquerque Police Department 66
- 3 Amtrak Police Department 54
- 4 Atlanta Police Department 36
- 5 Austin Police Department 40
- 6 BART Police Department 13
- 7 Baltimore City Police Department 30
- 8 Baltimore Police Department 30
- 9 Baton Rouge Police Department 15
-10 Boston Police Department 10
-# ℹ 98 more rows
-ggplot(data = shootings_case,
-mapping = aes(x = agency)) +
- geom_bar() +
- theme(axis.text.x = element_text(angle = 90,
- vjust = 1,
- hjust = 1,
- margin = margin(t = 5, b = 5)))
#count shootings by agency
+<- shootings_case |>
+ shootings_by_agency count(agency)
+
+#find top 25 agencies with the most shootings
+<- shootings_by_agency |>
+ top_25_agencies slice_max(n, n = 25)
+
+# visulize top 25 agencies with the most shootings
+ggplot(data = top_25_agencies,
+mapping = aes(x = agency, y = n)) +
+ geom_col() +
+ theme(axis.text.x = element_text(angle = 75,
+ vjust = 1,
+ hjust = 1,
+ margin = margin(t = 5, b = 5)))
#mapping Locations of Police-Involved Shootings between 2015 and 2023
-
-library(ggmap)
Warning: package 'ggmap' was built under R version 4.3.1
-ℹ Google's Terms of Service: <https://mapsplatform.google.com>
- Stadia Maps' Terms of Service: <https://stadiamaps.com/terms-of-service/>
- OpenStreetMap's Tile Usage Policy: <https://operations.osmfoundation.org/policies/tiles/>
-ℹ Please cite ggmap if you use it! Use `citation("ggmap")` for details.
-library(maps)
Warning: package 'maps' was built under R version 4.3.1
-
-Attaching package: 'maps'
-
-The following object is masked from 'package:purrr':
-
- map
-library(mapdata)
-
-<- map_data("usa")
- usa <- map_data("state")
- states
-ggplot(data = states) +
-geom_polygon(aes(x = long, y = lat, fill = group, group = group), color = "white") +
- coord_fixed(1.3) +
- guides(fill=FALSE) + # do this to leave off the color legend
- geom_point(data = shootings_case, aes(x = longitude, y = latitude), color = "black", size = .2) +
- geom_point(data = shootings_case, aes(x = longitude, y = latitude), color = "red", size = .1) +
- labs(title = "Locations of Police-Involved Shootings between 2015 and 2023",
- captions = "This is only includes cities where we have agency census data.",
- x = "Longitude",
- y = "Latitude")
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
-of ggplot2 3.3.4.
-Warning: Removed 309 rows containing missing values (`geom_point()`).
-Removed 309 rows containing missing values (`geom_point()`).
-#mapping Locations of Police-Involved Shootings between 2015 and 2023
+
+#load geo-viz libraries
+library(ggmap)
+library(maps)
+library(mapdata)
+
+#create blank map
+<- map_data("usa")
+ usa <- map_data("state")
+ states
+#add locations of shootings to maps
+<- ggplot(data = states) +
+ shot_map geom_polygon(aes(x = long, y = lat, fill = group, group = group), color = "white") +
+ coord_fixed(1.3) +
+ guides(fill=FALSE) + # do this to leave off the color legend
+ geom_point(data = shootings_case, aes(x = longitude, y = latitude), color = "black", size = .2) +
+ geom_point(data = shootings_case, aes(x = longitude, y = latitude), color = "red", size = .1) +
+ labs(title = "Locations of Police-Involved Shootings between 2015 and 2023",
+ captions = "This is only includes cities where we have agency census data.",
+ x = "Longitude",
+ y = "Latitude")
<- agencies_census |>
- agencies_census left_join(shootings_by_agency, by = c("name" = "agency"))
- agencies_census
# A tibble: 109 × 13
- name id state city police_force_size all white `non-white` black
- <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
- 1 Albany P… 2237 NY Alba… 890 0.185 0.160 0.364 **
- 2 Albuquer… 508 NM Albu… 1340 0.616 0.630 0.602 **
- 3 Amtrak P… 1657 IL Chic… 12120 0.875 0.872 0.877 0.89…
- 4 Atlanta … 447 GA Atla… 2950 0.137 0.186 0.111 0.10…
- 5 Austin P… 141 TX Aust… 1985 0.295 0.195 0.427 0.25
- 6 Baltimor… 4784 MD Balt… 2800 0.257 0.133 0.362 0.39…
- 7 Baltimor… 149 MD Balt… 2800 0.257 0.133 0.362 0.39…
- 8 BART Pol… 2015 CA Oakl… 1530 0.0948 0.0267 0.160 0.06…
- 9 Baton Ro… 1098 LA Bato… 980 0.214 0.144 0.321 0.34…
-10 Boston P… 3 MA Bost… 2560 0.477 0.442 0.583 0.68…
-# ℹ 99 more rows
-# ℹ 4 more variables: hispanic <chr>, asian <chr>, majority <chr>, n <int>
-|>
- agencies_census ggplot(mapping = aes(x = all, y = n, fill=)) +
- geom_point() +
- geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE)
#creating df with total shootings per agency and census data
+<- agencies_census |>
+ agencies_census left_join(shootings_by_agency, by = c("name" = "agency"))
+
+#prelim visualization of relationship between percentage of officer residency and number of fatal shootings per agency
+|>
+ agencies_census ggplot(mapping = aes(x = all, y = n)) +
+ geom_point() +
+ geom_smooth(method = "lm", se = TRUE)
|>
- shootings_case ggplot(aes(x = majority, fill = armed)) +
- geom_bar() +
- labs(title = "Shootings in Cities where a Majority of Officers Reside",
- captions = "This is only includes shootings where we have agency census data.",
- x = "Does a majority a of the total police force live in the city?",
- y = "Number of fatal shootings",
- fill = "Victim Armed?")
#creating visualization of comparison Shootings in Cities where a Majority/Minority of Officers Reside
+<- shootings_case |>
+ p0 ggplot(aes(x = majority, fill = armed)) +
+ geom_bar() +
+ labs(title = "Shootings in Cities where a Majority of Officers Reside",
+ caption = "This is only includes shootings where we have agency census data.",
+ x = "Does a majority a of the total police force live in the city?",
+ y = "Number of fatal shootings",
+ fill = "Victim Armed?")
+ p0
<- shootings_case |>
- majority_mean filter(majority == TRUE) |>
- count(agency) |>
- summarize(maj_mean = mean(n))
- majority_mean
# A tibble: 1 × 1
- maj_mean
- <dbl>
-1 32.4
-<- shootings_case |>
- minority_mean filter(majority == FALSE) |>
- count(agency) |>
- summarize(min_mean = mean(n))
- minority_mean
# A tibble: 1 × 1
- min_mean
- <dbl>
-1 35
-<- majority_mean - minority_mean
- diff_in_means diff_in_means
#calculate mean number of shootings per agency in cities where a majority of officers reside in the city
+<- shootings_case |>
+ majority_mean filter(majority == TRUE) |>
+ count(agency) |>
+ summarize(maj_mean = mean(n))
+
+#calculate mean number of shootings per agency in cities where a minority of officers reside in the city
+<- shootings_case |>
+ minority_mean filter(majority == FALSE) |>
+ count(agency) |>
+ summarize(min_mean = mean(n))
+
+#calculate a difference in means between the `majority` and `minority`
+<- majority_mean - minority_mean
+ diff_in_means diff_in_means
maj_mean
1 -2.575
::kable(head(diff_in_means)) knitr
#tidy table
+::kable(head(diff_in_means)) knitr
<- shootings_case %>%
- shootings_by_agency_census group_by(agency) %>%
- count(armed) %>%
- drop_na(n, armed) %>%
- right_join(agencies_census, by = c("agency" = "name")) |>
- distinct(armed, .keep_all = TRUE)
Warning in right_join(., agencies_census, by = c(agency = "name")): Detected an unexpected many-to-many relationship between `x` and `y`.
-ℹ Row 63 of `x` matches multiple rows in `y`.
-ℹ Row 2 of `y` matches multiple rows in `x`.
-ℹ If a many-to-many relationship is expected, set `relationship =
- "many-to-many"` to silence this warning.
- shootings_by_agency_census
# A tibble: 202 × 15
-# Groups: agency [108]
- agency armed n.x id state city police_force_size all white
- <chr> <chr> <int> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
- 1 Albany Police … YES 1 2237 NY Alba… 890 0.185 0.160
- 2 Albuquerque Po… NO 10 508 NM Albu… 1340 0.616 0.630
- 3 Albuquerque Po… YES 54 508 NM Albu… 1340 0.616 0.630
- 4 Amtrak Police … NO 8 1657 IL Chic… 12120 0.875 0.872
- 5 Amtrak Police … YES 46 1657 IL Chic… 12120 0.875 0.872
- 6 Atlanta Police… NO 5 447 GA Atla… 2950 0.137 0.186
- 7 Atlanta Police… YES 31 447 GA Atla… 2950 0.137 0.186
- 8 Austin Police … NO 3 141 TX Aust… 1985 0.295 0.195
- 9 Austin Police … YES 37 141 TX Aust… 1985 0.295 0.195
-10 BART Police De… YES 12 2015 CA Oakl… 1530 0.0948 0.0267
-# ℹ 192 more rows
-# ℹ 6 more variables: `non-white` <dbl>, black <chr>, hispanic <chr>,
-# asian <chr>, majority <chr>, n.y <int>
-<- shootings_by_agency_census |>
- shootings_by_agency_census select(n.x, armed, all)
Adding missing grouping variables: `agency`
-<- lm(n.x ~ all + armed, data = shootings_by_agency_census)
- fit_multi fit_multi
#add `armed` and `majority` to `shootings_by_agency` df
+<- shootings_case |>
+ shootings_by_agency_census group_by(agency) |>
+ count(armed) |>
+ drop_na(n, armed) |>
+ right_join(agencies_census, by = c("agency" = "name")) |>
+ distinct(armed, .keep_all = TRUE)
+
+<- shootings_by_agency_census |>
+ shootings_by_agency_census select(n.x, armed, all)
+
+#fit multiple linear regression model for correlation between percentage of officer residency and victim armament and number of fatal shootings per agency
+<- lm(n.x ~ all + armed, data = shootings_by_agency_census)
+ fit_multi fit_multi
Call:
@@ -646,8 +424,9 @@ Data
(Intercept) all armedYES
4.117 1.211 24.921
<- get_regression_table(fit_multi)
- p2 ::kable(head(p2)) knitr
#tidy `fit_multi`
+<- get_regression_table(fit_multi)
+ p2 ::kable(head(p2)) knitr
ggplot(data = shootings_by_agency_census, aes(x = all, y = n.x)) +
-geom_jitter(jitter = 15, alpha = 0.5) +
- geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE) +
- labs(title = "Number of Shootings on a Scale of Police Force Residency",
- x = "Percentage of the total police force that lives in the city",
- y = "Number of fatal shootings in that city")
Warning in geom_jitter(jitter = 15, alpha = 0.5): Ignoring unknown parameters:
-`jitter`
-#visualize polynomial relationship between percentage of officer residency and number of fatal shootings per agency
+ggplot(data = shootings_by_agency_census, aes(x = all, y = n.x)) +
+geom_jitter(width = 0.10, height = 0, alpha = 0.45) +
+ geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = TRUE) +
+ labs(title = "Number of Shootings on a Scale of Police Force Residency",
+ x = "Percentage of the total police force that lives in the city",
+ y = "Number of fatal shootings in that city")
#visualize polynomial relationship between percentage of officer residency and victim armament and number of fatal shootings per agency
+ggplot(data = shootings_by_agency_census, aes(x = all, y = n.x, color = armed)) +
+geom_jitter(width = 0.10, height = 0, alpha = 0.45) +
+ geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = TRUE) +
+ labs(title = "Number of Shootings on a Scale of Police Force Residency",
+ x = "Percentage of the total police force that lives in the city",
+ y = "Number of fatal shootings in that city",
+ color = "Victim Armed?")
ggplot(data = shootings_by_agency_census, aes(x = all, y = n.x, color = armed)) +
-geom_jitter(jitter = 15, alpha = 0.5) +
- geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE) +
- labs(title = "Number of Shootings on a Scale of Police Force Residency",
- x = "Percentage of the total police force that lives in the city",
- y = "Number of fatal shootings in that city")
Warning in geom_jitter(jitter = 15, alpha = 0.5): Ignoring unknown parameters:
-`jitter`
The model equation for fit
is:
[ = 35.7782 - 0.5874 ]
+Interpretation:
+all
) is \(0\). For each one-unit increase in the percentage of officer residency, the number of fatal shootings is expected to decrease by \(0.5874\) (\(-0.5874\)) units, assuming all other factors remain constant.This model suggests that there is a negative association between the percentage of officer residency and the number of fatal shootings. However, it’s important to interpret the results in the context of your data and consider potential confounding factors, like whether or not the victim was armed.
+The model equation for fit_multi
considering victim armament (armed
) is:
[ = 4.117 + 1.211 + 24.921 ]
+The intercept, \(4.117\), is the estimated number of fatal shootings where the percentage of officer in-city residency (all
) is \(0\) and the victim was un-armed. For each one-unit increase in the percentage of in-city officer residency compared to the total force (all
), we expect an increase of \(1.211\) fatal shootings, assuming the victim’s armament status (armedYES
) remains constant.
The coefficient for ‘armedYES’, \(24.921\), indicates that the victim is armed (armed
is YES
), we expect an increase of \(24.921\) fatal shootings compared to when the victim is not armed (armed
is No
), assuming the percentage of officer residency (all
) remains constant.
In summary, the model suggests that the percentage of officer residency and whether the victim is armed are associated with the number of fatal shootings per agency even as we control for victim armament. However, as correlation does not imply causation, and other factors not included in the model may influence the outcomes.
+#generate null distribution
+<- agencies_census |>
+ null_dist specify(n ~ majority) |>
+ hypothesize(null = "independence") |>
+ generate(reps = 1000, type = "permute") |>
+ calculate(stat = "diff in means", order = c("TRUE", "FALSE"))
+
+#compute observed test statistic
+<- agencies_census |>
+ test_stat specify(n ~ majority) |>
+ calculate(stat = "diff in means", order = c("TRUE", "FALSE"))
+
+#visualize p-value
+|>
+ null_dist visualize() +
+ shade_p_value(obs_stat = test_stat, direction = "less")
#compute p-value
+|>
+ null_dist get_p_value(obs_stat = test_stat, direction = "less")
# A tibble: 1 × 1
+ p_value
+ <dbl>
+1 0.263
Inference for a Difference in Means
+– \(H_0 : \mu_{maj} − \mu_{min} = 0\), or equivalently \(H_0 : \mu_{maj} = \mu_{min}\) – \(H_A : \mu_{maj} − \mu_{min} < 0\), or equivalently \(H_A : \mu_{maj} < \mu_{min}\)
+#generate null distribution
+<- agencies_census |>
+ null_dist_cor specify(n ~ white) |>
+ hypothesize(null = "independence") |>
+ generate(reps = 1000, type = "permute") |>
+ calculate(stat = "correlation")
+
+#compute observed test statistic
+<- agencies_census |>
+ test_stat_cor specify(n ~ white) |>
+ calculate(stat = "correlation")
+ test_stat_cor
Response: n (numeric)
+Explanatory: white (numeric)
+# A tibble: 1 × 1
+ stat
+ <dbl>
+1 -0.0470
+#visualize p-value
+|>
+ null_dist_cor visualize() +
+ shade_p_value(obs_stat = test_stat, direction = "two.sided")
#compute p-value
+|>
+ null_dist_cor get_p_value(obs_stat = test_stat, direction = "two.sided")
# A tibble: 1 × 1
+ p_value
+ <dbl>
+1 0
+Inference for a Correlation
+\(H_O\): There is no relationship between percentage of the total police force that lives in the city they serve and number of fatal shootings.
\(H_A\): There is a relationship between percentage of the total police force that lives in the city they serve and number of fatal shootings.
+\(H_0 : \rho = 0\)
\(H_0 : \rho \neq 0\)
fit
fit_multi
++On average, police in the United States shoot and kill more than 1,000 people every year…and then they go home to their families
+
This case study investigates the intricate relationship between police residence and fatal police shootings, employing a data science approach to uncover insights and patterns within the context of law enforcement agencies. Focused on police officers residing in the cities they serve, the study examines whether this residency factor correlates with the incidence of fatal police shootings. The data set, spanning the years 2015 to 2023, is composed of information on police agencies involved in at least one fatal shooting, and is subjected to rigorous analysis using advanced statistical methods and machine learning techniques.
This study aims to discern patterns, trends, and potential biases associated with the geographical proximity of police officers to the communities they police. A comprehensive exploration of demographic, socioeconomic, and policing variables contributes to a nuanced understanding of the factors influencing fatal police shootings. Furthermore, the study seeks to identify any disparities in incident rates based on officers’ residency status, considering variables such as race, community demographics, and departmental policies.
The insights derived from this case study bear substantial implications for informing public policy, refining police training protocols, and strengthening community relations. By unraveling the nuanced dynamics surrounding police residence and fatal police shootings, this case study aims to provide evidence-based recommendations to enhance transparency, accountability, and trust between law enforcement agencies and the communities they serve. In doing so, it contributes to the broader discourse on police reform, fostering a data-driven approach to address critical issues and promote safer, more resilient communities.
-Name | -Description | +maj_mean |
---|---|---|
city |
-U.S. city | -|
police_force_size |
-Number of police officers serving that city | -|
all |
-Percentage of the total police force that lives in the city | -|
white |
-Percentage of white (non-Hispanic) police officers who live in the city | -|
non-white |
-Percentage of non-white police officers who live in the city | -|
black |
-Percentage of black police officers who live in the city | -|
hispanic |
-Percentage of Hispanic police officers who live in the city | -|
asian |
-Percentage of Asian police officers who live in the city | +-2.575 |
Incident Information
-Name | -Description | +term | +estimate | +std_error | +statistic | +p_value | +lower_ci | +upper_ci |
---|---|---|---|---|---|---|---|---|
id |
-A unique identifier for each fatal police shooting incident. | -|||||||
date |
-The date of the fatal shooting. | -|||||||
body_camera |
-Whether news reports have indicated an officer was wearing a body camera and it may have recorded some portion of the incident. | -|||||||
city |
-The municipality where the fatal shooting took place | -|||||||
county |
-County where the fatal shooting took place. | +intercept | +35.778 | +6.887 | +5.195 | +0.000 | +22.126 | +49.430 |
state |
-The two-letter postal code abbreviation for the state in which the fatal shooting took place. | -|||||||
latitude |
-The latitude location of the shooting expressed as WGS84 coordinates, geocoded from addresses. Please note that the precision and accuracy of incident coordinates varies depending on the precision of the input address which is often only available at the block level. | -|||||||
longitude |
-The longitude location of the shooting expressed as WGS84 coordinates, geocoded from addresses. | +all | +-0.587 | +14.902 | +-0.039 | +0.969 | +-30.130 | +28.955 |
Agency Information
-- | Description | +term | +estimate | +std_error | +statistic | +p_value | +lower_ci | +upper_ci |
---|---|---|---|---|---|---|---|---|
id |
-Department Database Id | +intercept | +4.117 | +3.362 | +1.225 | +0.222 | +-2.512 | +10.747 |
name |
-Department Name | +all | +1.211 | +6.335 | +0.191 | +0.849 | +-11.281 | +13.702 |
state |
-State in which the agency is located. | +armed: YES | +24.921 | +2.891 | +8.619 | +0.000 | +19.219 | +30.622 |
I am interested in exploring data related to…
+The intercept, \(4.117\), is the estimated number of fatal shootings where the percentage of officer in-city residency (all
) is \(0\) and the victim was un-armed. For each one-unit increase in the percentage of in-city officer residency compared to the total force (all
), we expect an increase of \(1.211\) fatal shootings, assuming the victim’s armament status (armedYES
) remains constant.
The coefficient for ‘armedYES’, \(24.921\), indicates that the victim is armed (armed
is YES
), we expect an increase of \(24.921\) fatal shootings compared to when the victim is not armed (armed
is No
), assuming the percentage of officer residency (all
) remains constant.
In summary, the model suggests that the percentage of officer residency and whether the victim is armed are associated with the number of fatal shootings per agency even as we control for victim armament. However, as correlation does not imply causation, and other factors not included in the model may influence the outcomes.
+#visualize polynomial relationship between percentage of officer residency and victim armament and number of fatal shootings per agency
+ggplot(data = shootings_by_agency_census, aes(x = all, y = n.x, color = armed)) +
+geom_jitter(width = 0.10, height = 0, alpha = 0.45) +
+ geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = TRUE) +
+ labs(title = "Number of Shootings on a Scale of Police Force Residency",
+ x = "Percentage of the total police force that lives in the city",
+ y = "Number of fatal shootings in that city",
+ color = "Victim Armed?")
#generate null distribution
+<- agencies_census |>
+ null_dist specify(n ~ majority) |>
+ hypothesize(null = "independence") |>
+ generate(reps = 1000, type = "permute") |>
+ calculate(stat = "diff in means", order = c("TRUE", "FALSE"))
+
+#compute observed test statistic
+<- agencies_census |>
+ test_stat specify(n ~ majority) |>
+ calculate(stat = "diff in means", order = c("TRUE", "FALSE"))
+
+#visualize p-value
+|>
+ null_dist visualize() +
+ shade_p_value(obs_stat = test_stat, direction = "less")
#compute p-value
+|>
+ null_dist get_p_value(obs_stat = test_stat, direction = "less")
# A tibble: 1 × 1
+ p_value
+ <dbl>
+1 0.252
+At a significance level of \(\alpha = 0.05\), the p-value of \(0.248\) suggests that, there is insufficient evidence to reject the null hypothesis. In this context, since our null hypothesis asserts that mean total number of fatal shootings per agencies does not differ based on if a majority of the officers live in the city or not, our p-value indicates that, assuming our null is true, the probability of observing our given test statistic (difference in means; \(\mu_{maj} − \mu_{min}\)) is \(-4.92\) is around \(25\%\) (\(0.248\)). Meaning our observed difference in means between the groups is likely to have occurred by random chance.
+#generate null distribution
+<- agencies_census |>
+ null_dist_cor specify(n ~ white) |>
+ hypothesize(null = "independence") |>
+ generate(reps = 1000, type = "permute") |>
+ calculate(stat = "correlation")
+
+#compute observed test statistic
+<- agencies_census |>
+ test_stat_cor specify(n ~ white) |>
+ calculate(stat = "correlation")
+
+
+#visualize p-value
+|>
+ null_dist_cor visualize() +
+ shade_p_value(obs_stat = test_stat, direction = "two.sided")
#compute p-value
+|>
+ null_dist_cor get_p_value(obs_stat = test_stat, direction = "two.sided")
# A tibble: 1 × 1
+ p_value
+ <dbl>
+1 0
+At a significance level of \(\alpha = 0.05\), the p-value of \(0.248\) suggests that, there is sufficient evidence to reject the null hypothesis. In this context, since our null hypothesis asserts that there is no relationship between percentage of the total police force that lives in the city they serve and number of fatal shootings, our p-value indicates that, assuming our null is true, the probability of observing our given test statistic (correlation coefficient; \(\rho = 0\)) is \(-0.0470\) is around \(0\%\) (\(0\)). Meaning our observed correlation coefficient likely would not happen if there was no relationship between percentage of officer residency and number of fatal shootings for a given agency.
+