From d87666fd22130c4d33cc8199f30ec7a7920e49aa Mon Sep 17 00:00:00 2001 From: andybeet <22455149+andybeet@users.noreply.github.com> Date: Thu, 21 Mar 2024 21:33:34 -0400 Subject: [PATCH 1/3] edits made to comdat/landings data rmd --- chapters/landings_data.Rmd | 41 ++++++++++++++++---------------------- 1 file changed, 17 insertions(+), 24 deletions(-) diff --git a/chapters/landings_data.Rmd b/chapters/landings_data.Rmd index 292c996d..d4073280 100644 --- a/chapters/landings_data.Rmd +++ b/chapters/landings_data.Rmd @@ -8,17 +8,15 @@ **Contributor(s)**: Sean Lucey -**Data steward**: Sean Lucey, +**Data steward**: Sean Lucey, [Sean.Lucey\@noaa.gov](mailto:Sean.Lucey@noaa.gov){.email} -**Point of contact**: Sean Lucey, - -**Public availability statement**: Raw data are not publicly available due to confidentiality of individual fishery participants. Derived indicator outputs are -available [here](https://comet.nefsc.noaa.gov/erddap/tabledap/group_landings_soe_v1.html). +**Point of contact**: Sean Lucey, [Sean.Lucey\@noaa.gov](mailto:Sean.Lucey@noaa.gov){.email} +**Public availability statement**: Raw data are not publicly available due to confidentiality of individual fishery participants. ## Methods -Fisheries dependent data for the Northeast Shelf extend back several decades. Data from the 1960s on are housed in the Commercial database (CFDBS) of the Northeast Fisheries Science Center which contains the commercial fisheries dealer purchase records (weigh-outs) collected by National Marine Fisheries Service (NMFS) Statistical Reporting Specialists and state agencies from Maine to Virginia. The data format has changed slightly over the time series with three distinct time frames as noted in Table \@ref(tab:calibration1) below. +Fisheries dependent data for the Northeast Shelf extend back several decades. Data from the 1960s are housed in the Commercial database (CFDBS) of the Northeast Fisheries Science Center which contains the commercial fisheries dealer purchase records (weigh-outs) collected by National Marine Fisheries Service (NMFS) Statistical Reporting Specialists and state agencies from Maine to Virginia. The data format has changed slightly over the time series with three distinct time frames as noted in Table \@ref(tab:calibration1) below. ```{r calibration1, eval = T, echo = F} com.tables <- data.frame(Table = c('WOLANDS', 'WODETS', 'CFDETS_AA'), @@ -28,17 +26,16 @@ knitr::kable(com.tables, caption="Data formats", booktabs = T) #%>% ``` -Comlands is an R database pull that consolidates the landings records from 1964 on and attempts to associate them with NAFO statistical areas (Figure \@ref(fig:StatAreaMap)). The script is divided into three sections. The first pulls domestic landings data from the yearly landings tables and merges them into a single data source. The second section applies an algorithm to associate landings that are not allocated to a statistical area using similar characteristics of the trip to trips with known areas. The final section pulls foreign landings from the Northwest Atlantic Fisheries Organization website and rectifies species and gear codes so they can be merged along with domestic landings. +The landings records data are pulled from the Commercial database from 1964 to present year and algorithm is applied to associate landings that are not allocated to a statistical area using similar characteristics of the trip to trips with known areas. Foreign landings are then pulled from the Northwest Atlantic Fisheries Organization ([NAFO](https://www.nafo.int/)) website and merged with domestic landings. -```{r StatAreaMap, fig.cap="Map of the North Atlantic Fisheries Organization (NAFO) Statistical Areas. Colors represent the Ecological Production Unit (EPU) with which the statistical area is associated.", echo=F, eval=T, out.width = "50%", fig.align = "center"} +```{r StatAreaMap, fig.cap="Map of the Greater Atlantic Region Statistical Areas. Colors represent the Ecological Production Unit (EPU) with which the statistical area is associated.", echo=F, eval=T, out.width = "50%", fig.align = "center"} image.dir <- here::here('images') knitr::include_graphics(file.path(image.dir, 'Stat_Area_Map.jpg')) ``` -During the first section, the Comlands script pulls the temporal and spatial information as well as vessel and gear characteristics associated with the landings in addition to the weight, value, and utilization code of each species in the landings record. The script includes a toggle to use landed weights as opposed to live weights. For all but shellfish species, live weights are used for the State of the Ecosystem report. Due to the volume of data contained within each yearly landings table, landings are aggregated by species, utilization code, and area as well as by month, gear, and tonnage class. All weights are then converted from pounds to metric tons. Landings values are also adjusted for inflation using the Producer Price Index by Commodity for Processed Foods and Feeds: Unprocessed and Packaged Fish. Inflation is based on January of the terminal year of the data pull ensuring that all values are in current dollar prices. - +The R package [`comlandr`](https://noaa-edab.github.io/comlandr/) is used to pull the data. Specifically, the package pulls the temporal and spatial information as well as vessel and gear characteristics associated with the landings in addition to the weight, value, and utilization code of each species in the landings record. The package allows for landed weights as well as live weights. For all but shellfish species, live weights are used for the State of the Ecosystem report. Landings are aggregated by species, utilization code, and area as well as by month, gear, and tonnage class. All weights are then converted from pounds to metric tons. Landings values are also adjusted for inflation using the Producer Price Index by Commodity for Processed Foods and Feeds: Unprocessed and Packaged Fish. Inflation is based on January of the terminal year of the data pull ensuring that all values are in current dollar prices. ```{r geartypes, eval = T, echo = F} @@ -53,9 +50,9 @@ knitr::kable(gear.table, caption = "Gear types used in commercial landings", bo #kableExtra::kable_styling(full_width = F) ``` -Several species have additional steps after the data is pulled from CFDBS. Skates are typically landed as a species complex. In order to segregate the catch into species, the ratio of individual skate species in the NEFSC bottom trawl survey is used to disaggregate the landings. A similar algorithm is used to separate silver and offshore hake which can be mistaken for one another. Finally, Atlantic herring landings are pulled from a separate database as the most accurate weights are housed by the State of Maine. Comlands pulls from the State database and replaces the less accurate numbers from the federal database. +Several species have additional steps after the data is pulled from CFDBS. Skates are typically landed as a species complex. In order to segregate the catch into species, the ratio of individual skate species in the NEFSC bottom trawl survey is used to disaggregate the landings. A similar algorithm is used to separate silver and offshore hake which can be mistaken for one another. Finally, Atlantic herring landings are pulled from a separate database as the most accurate weights are housed by the State of Maine. The `comlandr` package pulls from the State database and replaces the less accurate numbers from the federal database. -The majority of landings data are associated with a NAFO Statistical Area. For those that are not, Comlands attempts to assign them to an area using similar characteristics of trips where the area is known. To simplify this task, landings data are further aggregated into quarter and half year, small and large vessels, and eight major gear categories (Table \@ref(tab:geartypes)). Landings are then proportioned to areas that meet similar characteristics based on the proportion of landings in each area by that temporal/vessel/gear combination. If a given attribute is unknown, the algorithm attempts to assign it one, once again based on matched characteristics of known trips. Statistical areas are then assigned to their respective [Ecological Production Unit](#epu) (Table \@ref(tab:statareas)). +The majority of landings data are associated with a Greater Atlantic Region Statistical Areas (Figure \@ref(fig:StatAreaMap)). For those that are not, the package attempts to assign them to an area using similar characteristics of trips where the area is known. To simplify this task, landings data are further aggregated into quarter and half year, small and large vessels, and eight major gear categories (Table \@ref(tab:geartypes)). Landings are then proportioned to areas that meet similar characteristics based on the proportion of landings in each area by that temporal/vessel/gear combination. If a given attribute is unknown, the algorithm attempts to assign it one, once again based on matched characteristics of known trips. Statistical areas are then assigned to their respective [Ecological Production Unit](#epu) (Table \@ref(tab:statareas)). ```{r statareas, eval = T, echo = F} area.table <- data.frame(EPU = c('Gulf of Maine', 'Georges Bank', 'Mid-Atlantic'), @@ -67,28 +64,24 @@ kable(area.table, caption = "Statistical areas making up each EPU") %>% kable_styling(latex_options = "HOLD_position") ``` -The final step of Comlands is to pull the foreign landings from the [NAFO database](https://www.nafo.int/Data/frames). US landings are removed from this extraction so as not to be double counted. NAFO codes and CFDBS codes differ so the script rectifies those codes to ensure that the data is seamlessly merged into the domestic landings. Foreign landings are flagged so that they can be removed if so desired. - +The final step is to pull the foreign landings from the [NAFO database](https://www.nafo.int/Data/). US landings are removed from this extraction so as not to be double counted. NAFO codes and CFDBS codes differ so the package rectifies those codes to ensure that the data is seamlessly merged into the domestic landings. Foreign landings are flagged so that they can be removed if so desired. ### Data sources -Comland is a database query of the NEFSC commercial fishery database (CFDBS). More information about the CFDBS is available [here](https://inport.nmfs.noaa.gov/inport/item/27401). -### Data extraction +A database query of the NEFSC commercial fishery database (CFDBS). More information about the CFDBS is available [here](https://inport.nmfs.noaa.gov/inport/item/27401). -[`comlandr`](https://github.com/NOAA-EDAB/comlandr) is a package used to extract relevant data from the database. +### Data extraction +[`comlandr`](https://noaa-edab.github.io/comlandr/) is an R package used to extract relevant data from the database. #### Data Processing -The landings data were formatted for inclusion in the `ecodata` R package with this [R code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_comdat.R). +The landings data were formatted for inclusion in the [`ecodata`](https://noaa-edab.github.io/ecodata/) R package ### Data analysis -Fisheries dependent data from Comlands is used in several indicators for the State of the Ecosystem report; the more complicated analyses are detailed in their own sections (ie. [bennet index](#bennet)). The most straightforward use of this data are the region total and aggregate landings indicators. Regional totals sum landings three ways: 1) All landings regardless of management authority and eventual use (i.e. food or bait), 2) All landings used for seafood but regardless of management authority, and 3) All landings used for seafood and managed by the regional fisheries management council for whom the report is presented. +Fisheries dependent data is used in several indicators for the State of the Ecosystem report; the more complicated analyses are detailed in their own sections (ie. [bennet index](#bennet)). The most straightforward use of this data are the region total and aggregate landings indicators. Regional totals sum landings three ways: 1) All landings regardless of management authority and eventual use (i.e. food or bait), 2) All landings used for seafood but regardless of management authority, and 3) All landings used for seafood and managed by the regional fisheries management council for whom the report is presented. -Landings are also calculated by aggregate groups per region. These are calculated by first assigning the various species into [aggregate groups](#aggroups). Landings are then summed by year, [EPU](#epu), aggregate group, and whether they are managed by the regional fisheries management council or not. Proportions of managed landings to total landings are also calculated and have been reported in some reports. +Landings are also calculated by aggregate groups per region. These are calculated by first assigning the various species into [aggregate groups](#species_groupings). Landings are then summed by year, [EPU](#epu), aggregate group, and whether they are managed by the regional fisheries management council or not. Proportions of managed landings to total landings are also calculated and have been reported in some reports. - These are calculated by first assigning the various species into [aggregate groups](#aggroups). Landings are then summed by year, [EPU](#epu), aggregate group, and whether they are managed by the regional fisheries management council or not. Proportions of managed landings to total landings are also calculated and have been reported in some reports. - -**catalog link** - \ No newline at end of file +**catalog link** From e86e57b52e2447c9c96121a87321c65a47fac4a8 Mon Sep 17 00:00:00 2001 From: andybeet <22455149+andybeet@users.noreply.github.com> Date: Thu, 21 Mar 2024 21:34:09 -0400 Subject: [PATCH 2/3] edited survey data rmd --- chapters/survey_data.rmd | 57 ++++++++++++---------------------------- 1 file changed, 17 insertions(+), 40 deletions(-) diff --git a/chapters/survey_data.rmd b/chapters/survey_data.rmd index 1b7a3d10..62d228e0 100644 --- a/chapters/survey_data.rmd +++ b/chapters/survey_data.rmd @@ -12,39 +12,18 @@ **Point of contact**: Sean Lucey -**Public availability statement**: Source data are available to qualified researchers upon request (see "Access Information" [here](https://inport.nmfs.noaa.gov/inport/item/22560)). Derived data used in SOE reports are available [here](https://comet.nefsc.noaa.gov/erddap/tabledap/group_landings_soe_v1.html). +**Public availability statement**: Source data are available to qualified researchers upon request (see "Access Information" [here](https://inport.nmfs.noaa.gov/inport/item/22560)). -NO SURVEYS IS 2020 +***Note: Due to the COVID-19 pandemic there were no surveys in 2020*** ## Methods -The Northeast Fisheries Science Center (NEFSC) has been conducting standardized bottom trawl surveys -in the fall since 1963 and spring since 1968. The surveys follow a stratified random design. Fish -species and several invertebrate species are enumerated on a tow by tow basis [@Azarovitz1981]. -The data are housed in the NEFSC's survey database (SVDBS) maintained by the Ecosystem Survey Branch. - -Direct pulls from the database are not advisable as there have been several gear modifications and -vessel changes over the course of the time series [@Miller_2010]. Survdat was developed as a database -query that applies the appropriate calibration factors for a seamless time series since the 1960s. -As such, it is the base for many of the other analyses conducted for the State of the Ecosystem -report that involve fisheries independent data. - -The Survdat script can be broken down into two sections. The first pulls the raw data from SVDBS. -While the script is able to pull data from more than just the spring and fall bottom trawl surveys, -for the purposes of the State of the Ecosystem reports only the spring and fall data are used. -Survdat identifies those research cruises associated with the seasonal bottom trawl surveys and pulls -the station and biological data. Station data includes tow identification (cruise, station, -and stratum), tow location and date, as well as several environmental variables (depth, surface/bottom salinity, -and surface/bottom temperature). Stations are filtered for representativness using a station, haul, gear -(SHG) code for tows prior to 2009 and a tow, operations, gear, and aquisition (TOGA) code from 2009 -onward. The codes that correspond to a representative tow (SHG <= 136 or TOGA <= 1324) are the same -used by assessment biologists at the NEFSC. Biological data includes the total biomass and abundance -by species, as well as lengths and number at length. - -The second section of the Survdat script applies the calibration factors. There are four calibrartion -factors applied (Table \@ref(tab:calibration)). Calibration factors are pulled directly from SVDBS. Vessel conversions were made from -either the NOAA Ship *Delaware II* or NOAA Ship *Henry Bigelow* to the NOAA Ship *Albatross IV* which was -the primary vessel for most of the time series. The Albatross was decommisioned in 2009 and the Bigelow is -now the primary vessel for the bottom trawl survey. +The Northeast Fisheries Science Center (NEFSC) has been conducting standardized bottom trawl surveys in the fall since 1963 and spring since 1968. The surveys follow a stratified random design. Fish species and several invertebrate species are enumerated on a tow by tow basis [@Azarovitz1981]. The data are housed in the NEFSC's survey database (SVDBS) maintained by the Ecosystem Survey Branch. + +Direct pulls from the database are not advisable as there have been several gear modifications and vessel changes over the course of the time series [@Miller_2010]. Survdat was developed as a database query that applies the appropriate calibration factors for a seamless time series since the 1960s. As such, it is the base for many of the other analyses conducted for the State of the Ecosystem report that involve fisheries independent data. + +The R package [`survdat`](https://noaa-edab.github.io/survdat/) is used to pull and process the data. For the purposes of the State of the Ecosystem reports only the spring and fall data are used. `survdat` identifies those research cruises associated with the seasonal bottom trawl surveys and pulls the station and biological data. Station data includes tow identification (cruise, station, and stratum), tow location and date, as well as several environmental variables (depth, surface/bottom salinity, and surface/bottom temperature). Stations are filtered using a station, haul, gear (SHG) code for tows prior to 2009 and a tow, operations, gear, and aquisition (TOGA) code from 2009 onward. The codes that correspond to a representative tow (SHG <= 136 or TOGA <= 1324) are the same used by assessment biologists at the NEFSC. Biological data includes the total biomass and abundance by species, as well as lengths and number at length. + +`survdat` applies the calibration factors. There are four calibrartion factors applied (Table \@ref(tab:calibration)). Calibration factors are pulled directly from SVDBS. Vessel conversions were made from either the NOAA Ship *Delaware II* or NOAA Ship *Henry Bigelow* to the NOAA Ship *Albatross IV* which was the primary vessel for most of the time series. The Albatross was decommissioned in 2009 and the Bigelow is now the primary vessel for the bottom trawl survey. ```{r calibration, eval = T, echo = F} cal.factors <- data.frame(Name = c('Door Conversion', 'Net Conversion', 'Vessel Conversion I', 'Vessel Conversion II'), @@ -54,25 +33,23 @@ kable(cal.factors, booktabs = TRUE, caption = "Calibration factors for NEFSC trawl survey data") ``` -The output from Survdat is an RData file that contains all the station and biological data, corrected -as noted above, from the NEFSC Spring Bottom Trawl Survey and NEFSC Fall Bottom Trawl Survey. The RData -file is a data.table, a powerful wrapper for the base data.frame (https://cran.r-project.org/web/packages/data.table/data.table.pdf). -There are also a series of tools that have been developed in order to utilize the Survdat data set -(https://github.com/NOAA-EDAB/survdat). ### Data sources -Survdat is a database query of the NEFSC survey database (SVDBS).These data are available to qualified researchers upon request. More information on the data request process is available under the "Access Information" field [here](https://inport.nmfs.noaa.gov/inport/item/22560). + +[`survdat`](https://noaa-edab.github.io/survdat/) is an R package that allows for queries of the NEFSC survey database (SVDBS).These data are available to qualified researchers upon request. More information on the data request process is available under the "Access Information" field [here](https://inport.nmfs.noaa.gov/inport/item/22560). ### Data extraction -Extraction methods are described above. The R code found [here](https://noaa-edab.github.io/survdat/) was used in the survey data extraction process. + +Extraction methods are described above. The R package [`survdat`](https://noaa-edab.github.io/survdat/) was used in the survey data extraction process. ### Data analysis -The fisheries independent data contained within the Survdat is used in a variety of + +The fisheries independent data obtained using `survdat` is used in a variety of products; the more complicated analyses are detailed in their own sections. The most straightforward use of this data is for the resource species aggregate biomass indicators. For the purposes of the aggregate biomass indicators, fall and spring survey data are treated separately. Additionally, all length data is dropped and -species seperated by sex at the catch level are merged back together. +species separated by sex at the catch level are merged back together. Since 2020, survey strata where characterized as being within an [Ecological Production Unit](#epu) based on where at least 50% of the area of the strata was located (Figure \@ref(fig:epustrata). While this does not create a perfect match for the EPU boundaries it allows us to calculate the variance associated with the index as the survey was designed. @@ -84,7 +61,7 @@ knitr::include_graphics(file.path(image.dir,"EPU_Designations_Map.jpg")) ``` -Prior to 2020, Survdat was first post stratified into EPUs by labeling stations by the EPU they fell within using the `over` function from the `rgdal` R package [@rgdal]. Next, the total number of stations within each EPU per year is counted using unique station records. Biomass is summed by species per year per EPU. Those sums are divided by the appropriate station count to get the EPU mean. Finally, the mean biomasses are summed by [aggregate groups](#aggroups). These steps are encompassed in the [processing code](https://github.com/NOAA-EDAB/ecodata/blob/master/data-raw/get_agg_bio.R), which also includes steps taken to format the data set for inclusion in the `ecodata` R package. +Prior to 2020, `survdat` would post stratified into EPUs by labeling stations with the EPU that contained them. The total number of stations within each EPU per year was counted using unique station records. Biomass was summed by species per year per EPU. Those sums were divided by the appropriate station count to get the EPU mean. Finally, the mean biomasses were summed by [aggregate groups](#species_groupings). **catalog link** No associated catalog page \ No newline at end of file From 5114af3ddd1e2234928b66e28fdb8c8e3f86f6b6 Mon Sep 17 00:00:00 2001 From: andybeet <22455149+andybeet@users.noreply.github.com> Date: Thu, 21 Mar 2024 21:34:24 -0400 Subject: [PATCH 3/3] edited ecosystem overfishing rmd --- chapters/ecosystem_overfishing.Rmd | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/chapters/ecosystem_overfishing.Rmd b/chapters/ecosystem_overfishing.Rmd index a7ae7660..a7987e53 100644 --- a/chapters/ecosystem_overfishing.Rmd +++ b/chapters/ecosystem_overfishing.Rmd @@ -46,7 +46,7 @@ $$PPR_t = \sum_{i=1}^{n_t} \left(\frac{landings_{t,i}}{9}\right) \left(\frac{1} where $n_t$ = number of species in time $t$, $landings_{t,i}$ = landings of species $i$ in time $t$, $TL_i$ is the trophic level of species $i$, $TE$ = Trophic efficiency. The PPR estimate assumes a 9:1 ratio for the conversion of wet weight to carbon and a 15% transfer efficiency per trophic level, ($TE$ = 0.15) -The index is presented as a percentage of [estimated primary production](https://noaa-edab.github.io/tech-doc/chl-pp.html) (PP) available over the geographic region of interest, termed an [Ecological Production Unit](https://noaa-edab.github.io/tech-doc/comdat.html) (EPU). The scaled index is estimated by dividing the PPR index in year $t$ by the estimated primary production in time $t$. +The index is presented as a percentage of [estimated primary production](#chl_pp) (PP) available over the geographic region of interest, termed an [Ecological Production Unit](#epu) (EPU). The scaled index is estimated by dividing the PPR index in year $t$ by the estimated primary production in time $t$. $$scaledPPR_t = \frac{PPR_t}{PP_t}$$ @@ -54,15 +54,15 @@ The species selected in each year were determined by their cumulative contributi #### Data sources -Data for this index come from a variety of sources. The landings data come from the Commercial Fishery Database (CFDBS), species trophic level information come from [fishbase](http://fishbase.de) and [sealifebase](http://sealifebase.ca), and primary production estimates are derived from [satellites](https://noaa-edab.github.io/tech-doc/chl-pp.html). Some of these data are typically not available to the public. +Data for this index come from a variety of sources. The landings data come from the Commercial Fishery Database (CFDBS), species trophic level information come from [fishbase](http://fishbase.de) and [sealifebase](http://sealifebase.ca), and primary production estimates are derived from [satellites](#chl_pp). Some of these data are typically not available to the public. #### Data extraction -Landings are extracted from the commercial fisheries database (CFDBS) using the methods described in the chapter [Commercial Landings Data.](https://noaa-edab.github.io/tech-doc/comdat.html) +Landings are extracted from the commercial fisheries database (CFDBS) using the methods described in the chapter [Commercial Landings Data.](#comdat) -Trophic level information for each species is obtained from [fishbase](http://fishbase.de) and [sealifebase](http://sealifebase.ca) using the R package [rfishbase](https://github.com/ropensci/rfishbase) [@froese2019fishbase] in tandem with the package [eofindices.](https://github.com/NOAA-EDAB/eofindices/) +Trophic level information for each species is obtained from [fishbase](http://fishbase.de) and [sealifebase](http://sealifebase.ca) using the R package [rfishbase](https://github.com/ropensci/rfishbase) [@froese2019fishbase] in tandem with the package [eofindices.](https://noaa-edab.github.io/eofindices/) -Primary Production is estimated using the methods described in the chapter [Chlorophyll a and Primary Production.](https://noaa-edab.github.io/tech-doc/chl-pp.html) +Primary Production is estimated using the methods described in the chapter [Chlorophyll a and Primary Production.](#chl-pp) #### Data analysis @@ -70,7 +70,7 @@ Primary Production is estimated using the methods described in the chapter [Chlo Annual (wet weight) landings are calculated for each species for a given EPU. For each year the landings are sorted in descending order by species and the cumulative landings are calculated. The species that accounted for the top 80% of total cumulative landings are selected. The trophic level for each of these species are then obtained from fishbase/sealifebase. At this point the PPR index is calculated. The units of the index are $gCyear^{-1}$ for the EPU. The index is converted to $gCm^{-2}year^{-1}$ by dividing by the area (in $m^2$) of the EPU. -To normalize the index the total Primary Production for the given EPU is required. This is calculated as described in the chapter [Chlorophyll a and Primary Production](https://noaa-edab.github.io/tech-doc/chl-pp.html). The units are also converted to $gCm^{-2}year^{-1}$. +To normalize the index the total Primary Production for the given EPU is required. This is calculated as described in the chapter [Chlorophyll a and Primary Production](#chl_pp). The units are also converted to $gCm^{-2}year^{-1}$. The index is then normalized by dividing the index in year t by the total primary production in time $t$.