diff --git a/.Rbuildignore b/.Rbuildignore index 43d08af..c130539 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -8,3 +8,4 @@ ^vignettes/articles$ ^README\.Rmd$ ^\.github$ +^cran-comments\.md$ diff --git a/NEWS.md b/NEWS.md index 1cc3f65..083a2cb 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,7 @@ +# BGmisc 1.0 + +* Added major update to include simulations, plotting, and examples. + # BGmisc 0.1 * Added a `NEWS.md` file to track changes to the package. diff --git a/README.Rmd b/README.Rmd index 857b5bf..f8cfed2 100644 --- a/README.Rmd +++ b/README.Rmd @@ -20,7 +20,7 @@ knitr::opts_chunk$set( [![R package version](https://www.r-pkg.org/badges/version/BGmisc)](https://cran.r-project.org/package=BGmisc) [![Package downloads](https://cranlogs.r-pkg.org/badges/grand-total/BGmisc)](https://cran.r-project.org/package=BGmisc) [![.github/workflows/draft-pdf.yml](https://github.com/R-Computing-Lab/BGmisc/actions/workflows/draft-pdf.yml/badge.svg)](https://github.com/R-Computing-Lab/BGmisc/actions/workflows/draft-pdf.yml) -[![R-CMD-check](https://github.com/R-Computing-Lab/BGmisc/actions/workflows/R-CMD-check.yaml/badge.svg)](hhttps://github.com/R-Computing-Lab/BGmisc/actions/workflows/R-CMD-check.yaml) +[![R-CMD-check](https://github.com/R-Computing-Lab/BGmisc/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/R-Computing-Lab/BGmisc/actions/workflows/R-CMD-check.yaml) ![License](https://img.shields.io/badge/License-GPL_v3-blue.svg) diff --git a/vignettes/articles/paper.Rmd b/vignettes/articles/paper.Rmd index a8a0273..a6638cb 100644 --- a/vignettes/articles/paper.Rmd +++ b/vignettes/articles/paper.Rmd @@ -71,27 +71,25 @@ Traditionally, twin studies have been at the forefront of this discipline. Howev # Statement of need -As behavior genetics delves into more complex data structures like extended pedigrees, the limitations of current tools become evident. The `BGmisc` R package addresses these challenges, going beyond what is available in tools like `OpenMx` and `EasyMx`, which mainly focus on classical twin models. +As behavior genetics delves into more complex data structures like extended pedigrees, the limitations of current tools become evident. Understandably, packages like `OpenMx` [@Neale2016], `EasyMx` [@easy], and `kinship2` [@kinship2; @kinship2R] were built for smaller families and classical designs. In contrast, the `BGmisc` R package was specifically developed to structure and model extended family pedigree data. -Two widely used R packages in genetics modeling are `OpenMx` [@Neale2016] and `kinship2` [@kinship2; @kinship2R]. The `OpenMx` [@Neale2016] package is a workhorse in behavior genetic research. Not only is it a general-purpose software for structural equation modeling that is popular among behavior geneticists [@Garrison2018], but also for its unique features -- the `mxCheckIdentification()` function. This function checks whether a model is identified, determining if there is a unique solution to estimate the model's parameters based on the observed data. In addition, `EasyMx` [@easy] is a more user-friendly package that streamlines the process of building and estimating structural equation models. It seamlessly integrates with `OpenMx`'s infrastructure. Its functionalities range from foundational matrix builders like `emxCholeskyVariance` and `emxGeneticFactorVariance` to more specialized functions like `emxTwinModel` designed for classical twin models. Despite their strengths, `EasyMx` and `OpenMx` have limitations when handling extended family data. Notably, they lack functions for handling modern molecular designs [@kirkpatrick_combining_2021], modeling complex genetic relationships, inferring relatedness, or simulating pedigrees. +Two widely-used R packages in genetic modeling are `OpenMx` [@Neale2016] and `kinship2` [@kinship2; @kinship2R]. The `OpenMx` [@Neale2016] package is a general-purpose software for structural equation modeling that is popular among behavior geneticists [@Garrison2018] for its unique features, like the `mxCheckIdentification()` function. This function checks whether a model is identified, determining if there is a unique solution to estimate the model's parameters based on the observed data. In addition, `EasyMx` [@easy] is a more user-friendly package that streamlines the process of building and estimating structural equation models. It seamlessly integrates with `OpenMx`'s infrastructure. Its functionalities range from foundational matrix builders like `emxCholeskyVariance` and `emxGeneticFactorVariance` to more specialized functions like `emxTwinModel` designed for classical twin models. Despite their strengths, `EasyMx` and `OpenMx` have limitations when handling extended family data. Notably, they lack functions for handling modern molecular designs [@kirkpatrick_combining_2021], modeling complex genetic relationships, inferring relatedness, and simulating pedigrees. Although not a staple in behavior genetics, the `kinship2` [@kinship2] package provides core features to the broader statistical genetics scientific community, such as plotting pedigrees and computing genetic relatedness matrices. It uses the Lange algorithm [@lange_genetic_2002] to compute relatedness coefficients. This recursive algorithm is discussed in great detail elsewhere, laying out several boundary conditions and recurrence rules. The `BGmisc` package extends the capabilities of `kinship2` by introducing an alternative algorithm to calculate the relatedness coefficient based on network models. By applying classic path-tracing rules to the entire network, this new method is computationally more efficient by eliminating the need for a multi-step recursive approach. ## Features -The `BGmisc` package offers various features tailored for extended behavior genetics analysis. These features are grouped under two main categories, mirroring the structure presented in our vignettes. - +The `BGmisc` package offers features tailored for extended behavior genetics analysis. These features are grouped under two main categories, mirroring the structure presented in our vignettes. ### Modeling and Relatedness: - Model Identification: `BGmisc` evaluates whether a variance components model is identified and fits the model's estimated variance components to observed covariance data. The technical aspects related to model identification have been described by @hunter_analytic_2021. -- Relatedness Coefficient Calculation: Using path tracing rules first described by @Wright1922 and formalized by @mcardleRAM, `BGmisc` calculates the (sparse) relatedness coefficients between all pairs of individuals in extended pedigrees based soley on mother and father identifiers. +- Relatedness Coefficient Calculation: Using path tracing rules first described by @Wright1922 and formalized by @mcardleRAM, `BGmisc` calculates the (sparse) relatedness coefficients between all pairs of individuals in extended pedigrees based solely on mother and father identifiers. - Relatedness Inference: `BGmisc` infers the relatedness between two groups based on their observed total correlation, given additive genetic and shared environmental parameters. - ### Pedigree Analysis and Simulation: - Pedigree Conversion: `BGmisc` converts pedigrees into various relatedness matrices, including additive genetics, mitochondrial, common nuclear, and extended environmental relatedness matrices. @@ -99,6 +97,7 @@ The `BGmisc` package offers various features tailored for extended behavior gene - Pedigree Simulation: `BGmisc` simulates pedigrees based on parameters including the number of children per mate, generations, sex ratio of newborns, and mating rate. + Collectively, these tools provide a valuable resource for behavior geneticists and others who work with extended family data. They were developed as part of a grant and have been used in several ongoing projects [@lyu_statistical_power_2023; @hunter_modeling_2023; @garrison_analyzing_2023; @burt_mom_genes_2023] and theses [@lyu_masters_thesis_2023]. diff --git a/vignettes/articles/paper.md b/vignettes/articles/paper.md index b1164e3..c299636 100644 --- a/vignettes/articles/paper.md +++ b/vignettes/articles/paper.md @@ -34,7 +34,7 @@ affiliations: index: 4 - name: Department of Psychology, Michigan State University, Michigan, USA index: 5 -date: "12 September, 2023" +date: "19 September, 2023" bibliography: paper.bib vignette: > %\VignetteEncoding{UTF-8} @@ -69,27 +69,25 @@ Traditionally, twin studies have been at the forefront of this discipline. Howev # Statement of need -As behavior genetics delves into more complex data structures like extended pedigrees, the limitations of current tools become evident. The `BGmisc` R package addresses these challenges, going beyond what is available in tools like `OpenMx` and `EasyMx`, which mainly focus on classical twin models. +As behavior genetics delves into more complex data structures like extended pedigrees, the limitations of current tools become evident. Understandably, packages like `OpenMx` [@Neale2016], `EasyMx` [@easy], and `kinship2` [@kinship2; @kinship2R] were built for smaller families and classical designs. In contrast, the `BGmisc` R package was specifically developed to structure and model extended family pedigree data. -Two widely used R packages in genetics modeling are `OpenMx` [@Neale2016] and `kinship2` [@kinship2; @kinship2R]. The `OpenMx` [@Neale2016] package is a workhorse in behavior genetic research. Not only is it a general-purpose software for structural equation modeling that is popular among behavior geneticists [@Garrison2018], but also for its unique features -- the `mxCheckIdentification()` function. This function checks whether a model is identified, determining if there is a unique solution to estimate the model's parameters based on the observed data. In addition, `EasyMx` [@easy] is a more user-friendly package that streamlines the process of building and estimating structural equation models. It seamlessly integrates with `OpenMx`'s infrastructure. Its functionalities range from foundational matrix builders like `emxCholeskyVariance` and `emxGeneticFactorVariance` to more specialized functions like `emxTwinModel` designed for classical twin models. Despite their strengths, `EasyMx` and `OpenMx` have limitations when handling extended family data. Notably, they lack functions for handling modern molecular designs [@kirkpatrick_combining_2021], modeling complex genetic relationships, inferring relatedness, or simulating pedigrees. +Two widely-used R packages in genetic modeling are `OpenMx` [@Neale2016] and `kinship2` [@kinship2; @kinship2R]. The `OpenMx` [@Neale2016] package is a general-purpose software for structural equation modeling that is popular among behavior geneticists [@Garrison2018] for its unique features, like the `mxCheckIdentification()` function. This function checks whether a model is identified, determining if there is a unique solution to estimate the model's parameters based on the observed data. In addition, `EasyMx` [@easy] is a more user-friendly package that streamlines the process of building and estimating structural equation models. It seamlessly integrates with `OpenMx`'s infrastructure. Its functionalities range from foundational matrix builders like `emxCholeskyVariance` and `emxGeneticFactorVariance` to more specialized functions like `emxTwinModel` designed for classical twin models. Despite their strengths, `EasyMx` and `OpenMx` have limitations when handling extended family data. Notably, they lack functions for handling modern molecular designs [@kirkpatrick_combining_2021], modeling complex genetic relationships, inferring relatedness, and simulating pedigrees. Although not a staple in behavior genetics, the `kinship2` [@kinship2] package provides core features to the broader statistical genetics scientific community, such as plotting pedigrees and computing genetic relatedness matrices. It uses the Lange algorithm [@lange_genetic_2002] to compute relatedness coefficients. This recursive algorithm is discussed in great detail elsewhere, laying out several boundary conditions and recurrence rules. The `BGmisc` package extends the capabilities of `kinship2` by introducing an alternative algorithm to calculate the relatedness coefficient based on network models. By applying classic path-tracing rules to the entire network, this new method is computationally more efficient by eliminating the need for a multi-step recursive approach. ## Features -The `BGmisc` package offers various features tailored for extended behavior genetics analysis. These features are grouped under two main categories, mirroring the structure presented in our vignettes. - +The `BGmisc` package offers features tailored for extended behavior genetics analysis. These features are grouped under two main categories, mirroring the structure presented in our vignettes. ### Modeling and Relatedness: - Model Identification: `BGmisc` evaluates whether a variance components model is identified and fits the model's estimated variance components to observed covariance data. The technical aspects related to model identification have been described by @hunter_analytic_2021. -- Relatedness Coefficient Calculation: Using path tracing rules first described by @Wright1922 and formalized by @mcardleRAM, `BGmisc` calculates the (sparse) relatedness coefficients between all pairs of individuals in extended pedigrees based soley on mother and father identifiers. +- Relatedness Coefficient Calculation: Using path tracing rules first described by @Wright1922 and formalized by @mcardleRAM, `BGmisc` calculates the (sparse) relatedness coefficients between all pairs of individuals in extended pedigrees based solely on mother and father identifiers. - Relatedness Inference: `BGmisc` infers the relatedness between two groups based on their observed total correlation, given additive genetic and shared environmental parameters. - ### Pedigree Analysis and Simulation: - Pedigree Conversion: `BGmisc` converts pedigrees into various relatedness matrices, including additive genetics, mitochondrial, common nuclear, and extended environmental relatedness matrices. @@ -97,6 +95,7 @@ The `BGmisc` package offers various features tailored for extended behavior gene - Pedigree Simulation: `BGmisc` simulates pedigrees based on parameters including the number of children per mate, generations, sex ratio of newborns, and mating rate. + Collectively, these tools provide a valuable resource for behavior geneticists and others who work with extended family data. They were developed as part of a grant and have been used in several ongoing projects [@lyu_statistical_power_2023; @hunter_modeling_2023; @garrison_analyzing_2023; @burt_mom_genes_2023] and theses [@lyu_masters_thesis_2023]. diff --git a/vignettes/pedigree.html b/vignettes/pedigree.html index 4c6ad9d..d284845 100644 --- a/vignettes/pedigree.html +++ b/vignettes/pedigree.html @@ -350,47 +350,53 @@
Unlike Tolstoy, where only happy families are alike, all pedigrees -are alike. Moreover, such pedigrees can be simulated as a function of -several parameters, including the number of children per mate, -generations, sex ratio of newborns, and mating rate
-The simulation function provides users the opportunity to test family -models in pedigrees with a customized pedigree length and width. Since -data in the form of large family pedigrees is difficult to collect or -access, simulated pedigrees serve as an efficient tool for building -statistical models using family data and evaluating the statistical -properties of the model, such as power, bias, and computational +
Unlike Tolstoy, where only happy families are alike, all
+pedigrees are alike – or at least, all simulated pedigrees are alike.
+The simulatePedigree
function generates a pedigree with a
+user-specified number of generations and individuals per generation.
+This function provides users the opportunity to test family models in
+pedigrees with a customized pedigree length and width.
These pedigrees can be simulated as a function of several parameters, +including the number of children per mate, generations, sex ratio of +newborns, and mating rate. Given that large family pedigrees are +difficult to collect or access, simulated pedigrees serve as an +efficient tool for researchers. These simulated pedigrees are useful for +building family-based statistical models, and evaluating their +statistical properties, such as power, bias, and computational efficiency.
-For example, a pedigree that follows these conditions: There are a -total of four generations in which each mating produces four offspring. -The number of male and female newborns is equal. 70% of individuals mate -and bear offspring. Such a pedigree structure can be simulated by +
To illustrate this, let us generate a pedigree. This pedigree has a +total of four generations, in which each person who “mates”, grows a +family with four offspring. In our scenario, the number of male and +female newborns is equal. In this illustration 70% of individuals will +mate and bear offspring. Such a pedigree structure can be simulated by running:
set.seed(5)
-Ped <- SimPed(kpc = 4, Ngen = 4, sexR = .5, marR = .7)
-
-summary(Ped)
-#> fam ID gen dadID
-#> Length:57 Min. : 10011 Min. :1.000 Min. : 10012
-#> Class :character 1st Qu.: 10036 1st Qu.:3.000 1st Qu.: 10024
-#> Mode :character Median :100312 Median :3.000 Median : 10037
-#> Mean : 59171 Mean :3.298 Mean : 42859
-#> 3rd Qu.:100416 3rd Qu.:4.000 3rd Qu.:100311
-#> Max. :100432 Max. :4.000 Max. :100320
-#> NA's :13
-#> momID spt sex
-#> Min. : 10011 Min. : 10011 Length:57
-#> 1st Qu.: 10022 1st Qu.: 10025 Class :character
-#> Median : 10036 Median : 10036 Mode :character
-#> Mean : 42859 Mean : 40124
-#> 3rd Qu.:100316 3rd Qu.:100311
-#> Max. :100318 Max. :100320
-#> NA's :13 NA's :33
The simulation output is a data.frame
with 57 rows and 7
columns. Each row corresponds to a simulated individual.
Ped[21, ]
+
The columns represents the individual’s family ID, the individual’s
@@ -410,10 +416,10 @@
Single Pedigree Visualization
To visualize a single simulated pedigree, use the
plotPedigree()
function.
# Plot the simulated pedigree
-plotPedigree(Ped)
+plotPedigree(df_ped)
#> Pedigree object with 57 subjects, family id= 1
#> Bit size= 75
-
+
#> Did not plot the following people: 10032
In the resulting plot, biological males are represented by squares,
while biological females are represented by circles, following the
@@ -425,10 +431,10 @@
Visualizing Multiple Pedigrees Side-by-Side
them together. For instance, let’s visualize pedigrees for families
spanning three and four generations, respectively.
# Simulate a family with 3 generations
-df_ped_3 <- SimPed(Ngen = 3)
+df_ped_3 <- simulatePedigree(Ngen = 3)
# Simulate a family with 4 generations
-df_ped_4 <- SimPed(Ngen = 4)
+df_ped_4 <- simulatePedigree(Ngen = 4)
# Set up plotting parameters for side-by-side display
par(mfrow = c(1, 2))
@@ -442,7 +448,7 @@ Visualizing Multiple Pedigrees Side-by-Side
plotPedigree(df_ped_4, width = 1)
#> Pedigree object with 29 subjects, family id= 1
#> Bit size= 34
-
+
By examining the side-by-side plots, you can contrast and analyze the
structures of different families, tracing the inheritance of specific
traits or conditions if needed.