diff --git a/.Rbuildignore b/.Rbuildignore index 43d08af..c130539 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -8,3 +8,4 @@ ^vignettes/articles$ ^README\.Rmd$ ^\.github$ +^cran-comments\.md$ diff --git a/NEWS.md b/NEWS.md index 1cc3f65..083a2cb 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,7 @@ +# BGmisc 1.0 + +* Added major update to include simulations, plotting, and examples. + # BGmisc 0.1 * Added a `NEWS.md` file to track changes to the package. diff --git a/README.Rmd b/README.Rmd index 857b5bf..f8cfed2 100644 --- a/README.Rmd +++ b/README.Rmd @@ -20,7 +20,7 @@ knitr::opts_chunk$set( [![R package version](https://www.r-pkg.org/badges/version/BGmisc)](https://cran.r-project.org/package=BGmisc) [![Package downloads](https://cranlogs.r-pkg.org/badges/grand-total/BGmisc)](https://cran.r-project.org/package=BGmisc)
[![.github/workflows/draft-pdf.yml](https://github.com/R-Computing-Lab/BGmisc/actions/workflows/draft-pdf.yml/badge.svg)](https://github.com/R-Computing-Lab/BGmisc/actions/workflows/draft-pdf.yml) -[![R-CMD-check](https://github.com/R-Computing-Lab/BGmisc/actions/workflows/R-CMD-check.yaml/badge.svg)](hhttps://github.com/R-Computing-Lab/BGmisc/actions/workflows/R-CMD-check.yaml) +[![R-CMD-check](https://github.com/R-Computing-Lab/BGmisc/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/R-Computing-Lab/BGmisc/actions/workflows/R-CMD-check.yaml) ![License](https://img.shields.io/badge/License-GPL_v3-blue.svg) diff --git a/vignettes/articles/paper.Rmd b/vignettes/articles/paper.Rmd index a8a0273..a6638cb 100644 --- a/vignettes/articles/paper.Rmd +++ b/vignettes/articles/paper.Rmd @@ -71,27 +71,25 @@ Traditionally, twin studies have been at the forefront of this discipline. Howev # Statement of need -As behavior genetics delves into more complex data structures like extended pedigrees, the limitations of current tools become evident. The `BGmisc` R package addresses these challenges, going beyond what is available in tools like `OpenMx` and `EasyMx`, which mainly focus on classical twin models. +As behavior genetics delves into more complex data structures like extended pedigrees, the limitations of current tools become evident. Understandably, packages like `OpenMx` [@Neale2016], `EasyMx` [@easy], and `kinship2` [@kinship2; @kinship2R] were built for smaller families and classical designs. In contrast, the `BGmisc` R package was specifically developed to structure and model extended family pedigree data. -Two widely used R packages in genetics modeling are `OpenMx` [@Neale2016] and `kinship2` [@kinship2; @kinship2R]. The `OpenMx` [@Neale2016] package is a workhorse in behavior genetic research. Not only is it a general-purpose software for structural equation modeling that is popular among behavior geneticists [@Garrison2018], but also for its unique features -- the `mxCheckIdentification()` function. This function checks whether a model is identified, determining if there is a unique solution to estimate the model's parameters based on the observed data. In addition, `EasyMx` [@easy] is a more user-friendly package that streamlines the process of building and estimating structural equation models. It seamlessly integrates with `OpenMx`'s infrastructure. Its functionalities range from foundational matrix builders like `emxCholeskyVariance` and `emxGeneticFactorVariance` to more specialized functions like `emxTwinModel` designed for classical twin models. Despite their strengths, `EasyMx` and `OpenMx` have limitations when handling extended family data. Notably, they lack functions for handling modern molecular designs [@kirkpatrick_combining_2021], modeling complex genetic relationships, inferring relatedness, or simulating pedigrees. +Two widely-used R packages in genetic modeling are `OpenMx` [@Neale2016] and `kinship2` [@kinship2; @kinship2R]. The `OpenMx` [@Neale2016] package is a general-purpose software for structural equation modeling that is popular among behavior geneticists [@Garrison2018] for its unique features, like the `mxCheckIdentification()` function. This function checks whether a model is identified, determining if there is a unique solution to estimate the model's parameters based on the observed data. In addition, `EasyMx` [@easy] is a more user-friendly package that streamlines the process of building and estimating structural equation models. It seamlessly integrates with `OpenMx`'s infrastructure. Its functionalities range from foundational matrix builders like `emxCholeskyVariance` and `emxGeneticFactorVariance` to more specialized functions like `emxTwinModel` designed for classical twin models. Despite their strengths, `EasyMx` and `OpenMx` have limitations when handling extended family data. Notably, they lack functions for handling modern molecular designs [@kirkpatrick_combining_2021], modeling complex genetic relationships, inferring relatedness, and simulating pedigrees. Although not a staple in behavior genetics, the `kinship2` [@kinship2] package provides core features to the broader statistical genetics scientific community, such as plotting pedigrees and computing genetic relatedness matrices. It uses the Lange algorithm [@lange_genetic_2002] to compute relatedness coefficients. This recursive algorithm is discussed in great detail elsewhere, laying out several boundary conditions and recurrence rules. The `BGmisc` package extends the capabilities of `kinship2` by introducing an alternative algorithm to calculate the relatedness coefficient based on network models. By applying classic path-tracing rules to the entire network, this new method is computationally more efficient by eliminating the need for a multi-step recursive approach. ## Features -The `BGmisc` package offers various features tailored for extended behavior genetics analysis. These features are grouped under two main categories, mirroring the structure presented in our vignettes. - +The `BGmisc` package offers features tailored for extended behavior genetics analysis. These features are grouped under two main categories, mirroring the structure presented in our vignettes. ### Modeling and Relatedness: - Model Identification: `BGmisc` evaluates whether a variance components model is identified and fits the model's estimated variance components to observed covariance data. The technical aspects related to model identification have been described by @hunter_analytic_2021. -- Relatedness Coefficient Calculation: Using path tracing rules first described by @Wright1922 and formalized by @mcardleRAM, `BGmisc` calculates the (sparse) relatedness coefficients between all pairs of individuals in extended pedigrees based soley on mother and father identifiers. +- Relatedness Coefficient Calculation: Using path tracing rules first described by @Wright1922 and formalized by @mcardleRAM, `BGmisc` calculates the (sparse) relatedness coefficients between all pairs of individuals in extended pedigrees based solely on mother and father identifiers. - Relatedness Inference: `BGmisc` infers the relatedness between two groups based on their observed total correlation, given additive genetic and shared environmental parameters. - ### Pedigree Analysis and Simulation: - Pedigree Conversion: `BGmisc` converts pedigrees into various relatedness matrices, including additive genetics, mitochondrial, common nuclear, and extended environmental relatedness matrices. @@ -99,6 +97,7 @@ The `BGmisc` package offers various features tailored for extended behavior gene - Pedigree Simulation: `BGmisc` simulates pedigrees based on parameters including the number of children per mate, generations, sex ratio of newborns, and mating rate. + Collectively, these tools provide a valuable resource for behavior geneticists and others who work with extended family data. They were developed as part of a grant and have been used in several ongoing projects [@lyu_statistical_power_2023; @hunter_modeling_2023; @garrison_analyzing_2023; @burt_mom_genes_2023] and theses [@lyu_masters_thesis_2023]. diff --git a/vignettes/articles/paper.md b/vignettes/articles/paper.md index b1164e3..c299636 100644 --- a/vignettes/articles/paper.md +++ b/vignettes/articles/paper.md @@ -34,7 +34,7 @@ affiliations: index: 4 - name: Department of Psychology, Michigan State University, Michigan, USA index: 5 -date: "12 September, 2023" +date: "19 September, 2023" bibliography: paper.bib vignette: > %\VignetteEncoding{UTF-8} @@ -69,27 +69,25 @@ Traditionally, twin studies have been at the forefront of this discipline. Howev # Statement of need -As behavior genetics delves into more complex data structures like extended pedigrees, the limitations of current tools become evident. The `BGmisc` R package addresses these challenges, going beyond what is available in tools like `OpenMx` and `EasyMx`, which mainly focus on classical twin models. +As behavior genetics delves into more complex data structures like extended pedigrees, the limitations of current tools become evident. Understandably, packages like `OpenMx` [@Neale2016], `EasyMx` [@easy], and `kinship2` [@kinship2; @kinship2R] were built for smaller families and classical designs. In contrast, the `BGmisc` R package was specifically developed to structure and model extended family pedigree data. -Two widely used R packages in genetics modeling are `OpenMx` [@Neale2016] and `kinship2` [@kinship2; @kinship2R]. The `OpenMx` [@Neale2016] package is a workhorse in behavior genetic research. Not only is it a general-purpose software for structural equation modeling that is popular among behavior geneticists [@Garrison2018], but also for its unique features -- the `mxCheckIdentification()` function. This function checks whether a model is identified, determining if there is a unique solution to estimate the model's parameters based on the observed data. In addition, `EasyMx` [@easy] is a more user-friendly package that streamlines the process of building and estimating structural equation models. It seamlessly integrates with `OpenMx`'s infrastructure. Its functionalities range from foundational matrix builders like `emxCholeskyVariance` and `emxGeneticFactorVariance` to more specialized functions like `emxTwinModel` designed for classical twin models. Despite their strengths, `EasyMx` and `OpenMx` have limitations when handling extended family data. Notably, they lack functions for handling modern molecular designs [@kirkpatrick_combining_2021], modeling complex genetic relationships, inferring relatedness, or simulating pedigrees. +Two widely-used R packages in genetic modeling are `OpenMx` [@Neale2016] and `kinship2` [@kinship2; @kinship2R]. The `OpenMx` [@Neale2016] package is a general-purpose software for structural equation modeling that is popular among behavior geneticists [@Garrison2018] for its unique features, like the `mxCheckIdentification()` function. This function checks whether a model is identified, determining if there is a unique solution to estimate the model's parameters based on the observed data. In addition, `EasyMx` [@easy] is a more user-friendly package that streamlines the process of building and estimating structural equation models. It seamlessly integrates with `OpenMx`'s infrastructure. Its functionalities range from foundational matrix builders like `emxCholeskyVariance` and `emxGeneticFactorVariance` to more specialized functions like `emxTwinModel` designed for classical twin models. Despite their strengths, `EasyMx` and `OpenMx` have limitations when handling extended family data. Notably, they lack functions for handling modern molecular designs [@kirkpatrick_combining_2021], modeling complex genetic relationships, inferring relatedness, and simulating pedigrees. Although not a staple in behavior genetics, the `kinship2` [@kinship2] package provides core features to the broader statistical genetics scientific community, such as plotting pedigrees and computing genetic relatedness matrices. It uses the Lange algorithm [@lange_genetic_2002] to compute relatedness coefficients. This recursive algorithm is discussed in great detail elsewhere, laying out several boundary conditions and recurrence rules. The `BGmisc` package extends the capabilities of `kinship2` by introducing an alternative algorithm to calculate the relatedness coefficient based on network models. By applying classic path-tracing rules to the entire network, this new method is computationally more efficient by eliminating the need for a multi-step recursive approach. ## Features -The `BGmisc` package offers various features tailored for extended behavior genetics analysis. These features are grouped under two main categories, mirroring the structure presented in our vignettes. - +The `BGmisc` package offers features tailored for extended behavior genetics analysis. These features are grouped under two main categories, mirroring the structure presented in our vignettes. ### Modeling and Relatedness: - Model Identification: `BGmisc` evaluates whether a variance components model is identified and fits the model's estimated variance components to observed covariance data. The technical aspects related to model identification have been described by @hunter_analytic_2021. -- Relatedness Coefficient Calculation: Using path tracing rules first described by @Wright1922 and formalized by @mcardleRAM, `BGmisc` calculates the (sparse) relatedness coefficients between all pairs of individuals in extended pedigrees based soley on mother and father identifiers. +- Relatedness Coefficient Calculation: Using path tracing rules first described by @Wright1922 and formalized by @mcardleRAM, `BGmisc` calculates the (sparse) relatedness coefficients between all pairs of individuals in extended pedigrees based solely on mother and father identifiers. - Relatedness Inference: `BGmisc` infers the relatedness between two groups based on their observed total correlation, given additive genetic and shared environmental parameters. - ### Pedigree Analysis and Simulation: - Pedigree Conversion: `BGmisc` converts pedigrees into various relatedness matrices, including additive genetics, mitochondrial, common nuclear, and extended environmental relatedness matrices. @@ -97,6 +95,7 @@ The `BGmisc` package offers various features tailored for extended behavior gene - Pedigree Simulation: `BGmisc` simulates pedigrees based on parameters including the number of children per mate, generations, sex ratio of newborns, and mating rate. + Collectively, these tools provide a valuable resource for behavior geneticists and others who work with extended family data. They were developed as part of a grant and have been used in several ongoing projects [@lyu_statistical_power_2023; @hunter_modeling_2023; @garrison_analyzing_2023; @burt_mom_genes_2023] and theses [@lyu_masters_thesis_2023]. diff --git a/vignettes/pedigree.html b/vignettes/pedigree.html index 4c6ad9d..d284845 100644 --- a/vignettes/pedigree.html +++ b/vignettes/pedigree.html @@ -350,47 +350,53 @@

Loading Required Libraries

library(BGmisc)
-
-

Simulating Pedigree

-

Unlike Tolstoy, where only happy families are alike, all pedigrees -are alike. Moreover, such pedigrees can be simulated as a function of -several parameters, including the number of children per mate, -generations, sex ratio of newborns, and mating rate

-

The simulation function provides users the opportunity to test family -models in pedigrees with a customized pedigree length and width. Since -data in the form of large family pedigrees is difficult to collect or -access, simulated pedigrees serve as an efficient tool for building -statistical models using family data and evaluating the statistical -properties of the model, such as power, bias, and computational +

+

Simulating Pedigrees

+

Unlike Tolstoy, where only happy families are alike, all +pedigrees are alike – or at least, all simulated pedigrees are alike. +The simulatePedigree function generates a pedigree with a +user-specified number of generations and individuals per generation. +This function provides users the opportunity to test family models in +pedigrees with a customized pedigree length and width.

+

These pedigrees can be simulated as a function of several parameters, +including the number of children per mate, generations, sex ratio of +newborns, and mating rate. Given that large family pedigrees are +difficult to collect or access, simulated pedigrees serve as an +efficient tool for researchers. These simulated pedigrees are useful for +building family-based statistical models, and evaluating their +statistical properties, such as power, bias, and computational efficiency.

-

For example, a pedigree that follows these conditions: There are a -total of four generations in which each mating produces four offspring. -The number of male and female newborns is equal. 70% of individuals mate -and bear offspring. Such a pedigree structure can be simulated by +

To illustrate this, let us generate a pedigree. This pedigree has a +total of four generations, in which each person who “mates”, grows a +family with four offspring. In our scenario, the number of male and +female newborns is equal. In this illustration 70% of individuals will +mate and bear offspring. Such a pedigree structure can be simulated by running:

set.seed(5)
-Ped <- SimPed(kpc = 4, Ngen = 4, sexR = .5, marR = .7)
-
-summary(Ped)
-#>      fam                  ID              gen            dadID       
-#>  Length:57          Min.   : 10011   Min.   :1.000   Min.   : 10012  
-#>  Class :character   1st Qu.: 10036   1st Qu.:3.000   1st Qu.: 10024  
-#>  Mode  :character   Median :100312   Median :3.000   Median : 10037  
-#>                     Mean   : 59171   Mean   :3.298   Mean   : 42859  
-#>                     3rd Qu.:100416   3rd Qu.:4.000   3rd Qu.:100311  
-#>                     Max.   :100432   Max.   :4.000   Max.   :100320  
-#>                                                      NA's   :13      
-#>      momID             spt             sex           
-#>  Min.   : 10011   Min.   : 10011   Length:57         
-#>  1st Qu.: 10022   1st Qu.: 10025   Class :character  
-#>  Median : 10036   Median : 10036   Mode  :character  
-#>  Mean   : 42859   Mean   : 40124                     
-#>  3rd Qu.:100316   3rd Qu.:100311                     
-#>  Max.   :100318   Max.   :100320                     
-#>  NA's   :13       NA's   :33
+df_ped <- simulatePedigree(kpc = 4, + Ngen = 4, + sexR = .5, + marR = .7) +summary(df_ped) +#> fam ID gen dadID +#> Length:57 Min. : 10011 Min. :1.000 Min. : 10012 +#> Class :character 1st Qu.: 10036 1st Qu.:3.000 1st Qu.: 10024 +#> Mode :character Median :100312 Median :3.000 Median : 10037 +#> Mean : 59171 Mean :3.298 Mean : 42859 +#> 3rd Qu.:100416 3rd Qu.:4.000 3rd Qu.:100311 +#> Max. :100432 Max. :4.000 Max. :100320 +#> NA's :13 +#> momID spt sex +#> Min. : 10011 Min. : 10011 Length:57 +#> 1st Qu.: 10022 1st Qu.: 10025 Class :character +#> Median : 10036 Median : 10036 Mode :character +#> Mean : 42859 Mean : 40124 +#> 3rd Qu.:100316 3rd Qu.:100311 +#> Max. :100318 Max. :100320 +#> NA's :13 NA's :33

The simulation output is a data.frame with 57 rows and 7 columns. Each row corresponds to a simulated individual.

-
Ped[21, ]
+
df_ped[21, ]
 #>      fam     ID gen dadID momID    spt sex
 #> 21 fam 1 100312   3 10024 10022 100317   M

The columns represents the individual’s family ID, the individual’s @@ -410,10 +416,10 @@

Single Pedigree Visualization

To visualize a single simulated pedigree, use the plotPedigree() function.

# Plot the simulated pedigree
-plotPedigree(Ped)
+plotPedigree(df_ped)
 #> Pedigree object with 57 subjects, family id= 1 
 #> Bit size= 75
-

+

#> Did not plot the following people: 10032

In the resulting plot, biological males are represented by squares, while biological females are represented by circles, following the @@ -425,10 +431,10 @@

Visualizing Multiple Pedigrees Side-by-Side

them together. For instance, let’s visualize pedigrees for families spanning three and four generations, respectively.

# Simulate a family with 3 generations
-df_ped_3 <- SimPed(Ngen = 3)
+df_ped_3 <- simulatePedigree(Ngen = 3)
 
 # Simulate a family with 4 generations
-df_ped_4 <- SimPed(Ngen = 4)
+df_ped_4 <- simulatePedigree(Ngen = 4)
 
 # Set up plotting parameters for side-by-side display
 par(mfrow = c(1, 2))
@@ -442,7 +448,7 @@ 

Visualizing Multiple Pedigrees Side-by-Side

plotPedigree(df_ped_4, width = 1) #> Pedigree object with 29 subjects, family id= 1 #> Bit size= 34
-

+

By examining the side-by-side plots, you can contrast and analyze the structures of different families, tracing the inheritance of specific traits or conditions if needed.