06_pathdiagrams.Rmd

# (PART\*) Path Diagrams {-}

# Introduction to Path Diagrams

```{r include = FALSE}
source("admin/common.R")
```

Path diagrams are a graphical representation of a structural equation model. They are a useful tool for understanding the relationships between variables in a model, as well as a way to communicate the model to others.

In this chapter, we will learn how to create path diagrams using dot notation, as well as explore several software tools available for creating path diagrams, such as `DiagrammeR`, `OpenMx`, and `umx` in R.


## Constructing a Simple Path Diagram


To better understand path diagrams, let's manually construct a simple example. Consider a model where one latent variable influences two observed variables. We can represent this model using the following dot notation:

### Understanding the Components
- **Nodes:** Represent variables, which can be observed or latent. Nodes are usually depicted as circles (latent variables) or squares (observed variables).
- **Edges:** Represent the causal relationships or correlations between variables. An arrow from one node to another indicates a directional relationship, whereas a two-headed arrow indicates a correlation.


### Step-by-Step Construction
1. **Define the Nodes**: Start by defining your nodes, which represent the variables. Here, we have one latent variable and two observed variables.
2. **Draw the Edges**: Next, draw edges to represent the relationships. In this case, the latent variable influences both observed variables.

#### Example Diagram

Let's put these components together using the `DiagrammeR` package to:

```{r simple-path-diagram}
library(DiagrammeR)
grViz("
  digraph simple_model {
    node [fontname = Ariel, fontsize = 10]
    Latent [shape = circle, label = 'Latent Variable L']
    Obs1 [shape = box, label = 'Observed Variable 1']
    Obs2 [shape = box, label = 'Observed Variable 2']
    Latent -> Obs1
    Latent -> Obs2
  }
")
```


This script uses the `DiagrammeR` package to manually create a path diagram where 'Latent Variable L' influences 'Observed Variable 1' and 'Observed Variable 2'. The arrows indicate the direction of influence from the latent to the observed variables.


### ACE Model Example


Here is a more complex example of a path diagram for a univariate ACE model using dot notation and the `DiagrammeR` package in R.

```{r}
library(DiagrammeR)
grViz('digraph "Univariate ACE Model" {
	node [style=filled, fontname="Arial", fontsize=16];

	/* Observed Trait */
	Trait [shape=square, fillcolor="#a9fab1", height=0.5, width=0.5, label="Trait"];

	/* Latent Variables */
	A [shape=circle, fillcolor="#f4fd78", label="A"];
	C [shape=circle, fillcolor="#f4fd78", label="C"];
	E [shape=circle, fillcolor="#f4fd78", label="E"];

	/* Paths from Latent Variables to Observed Trait */
	A -> Trait [dir=forward];
	C -> Trait [dir=forward];
	E -> Trait [dir=forward];

	/* Variance Paths for Latent Variables */
	A -> A [dir=both, headport=n, tailport=n];
	C -> C [dir=both, headport=n, tailport=n];
	E -> E [dir=both, headport=n, tailport=n];
}'
)
```

Below is an explanation of the code snippet:

Latent Variables:

- `A [shape=circle, fillcolor="#f4fd78", label="A"];`: Defines the latent variable A (Additive genetic factors) with a circular shape, yellow color (#f4fd78), and the label "A".
- `C [shape=circle, fillcolor="#f4fd78", label="C"];`: Defines the latent variable C (Common/shared environmental factors) with a circular shape, yellow color, and the label "C".
- `E [shape=circle, fillcolor="#f4fd78", label="E"];`: Defines the latent variable E (Unique environmental factors) with a circular shape, yellow color, and the label "E".

Paths from Latent Variables to Observed Trait:

- `A -> Trait [dir=forward];`: Creates a forward directional path from the latent variable A to the observed trait.
- `C -> Trait [dir=forward];`: Creates a forward directional path from the latent variable C to the observed trait.
- `E -> Trait [dir=forward];`: Creates a forward directional path from the latent variable E to the observed trait.

Variance Paths for Latent Variables:

- `A -> A [dir=both, headport=n, tailport=n];`: Represents the variance of the latent variable A with a bidirectional path.
- `C -> C [dir=both, headport=n, tailport=n];`: Represents the variance of the latent variable C with a bidirectional path.
- `E -> E [dir=both, headport=n, tailport=n];`: Represents the variance of the latent variable E with a bidirectional path.

The resulting path diagram visualizes the relationships between the latent variables (A, C, E) and the observed trait in an ACE model.


## Pre-existing software


There are several existing software tools that can be used to create path diagrams for specific models, such as the `umx` and `OpenMx` packages in R. These tools provide a user-friendly interface for creating and visualizing path diagrams, making it easier to understand and communicate complex models.


### Creating a Path Diagram with `omxGraphviz`

```{r}
library(OpenMx)
data(demoOneFactor)
manifests <- names(demoOneFactor)
latents <- c("G1")
model1 <- mxModel("One Factor", type="RAM",
    manifestVars = manifests,
    latentVars = latents,
    mxPath(from=latents, to=manifests),
    mxPath(from=manifests, arrows=2),
    mxPath(from=latents, arrows=2, free=F, values=1.0),
    mxData(cov(demoOneFactor), type="cov",numObs=500)
)
omxGraphviz(model1, "one-factor-generated.dot")

```

The following code snippet creates a path diagram for a one-factor model using the OpenMx package. The model includes one latent variable (G1) and three manifest variables (x1, x2, x3). The `mxModel` function is used to define the model, and the `mxPath` function is used to specify the paths between variables. The `mxData` function is used to specify the data for the model. Finally, the `omxGraphviz` function is used to generate a graphical representation of the model in the form of a dot file.

I have annotated the dot file to explain the purpose of each function and argument. The resulting dot file can be visualized using graph visualization tools like Graphviz or packages like `DiagrammeR`.

```
digraph "One Factor" {
  // Setting the style and font for all nodes
  node [style=filled, fontname="Arial", fontsize=16];
  
  /* Manifest Variables */
    // Grouping all manifest variables (x1 to x5) to be on the same rank (horizontal alignment)
  { rank = max; x1; x2; x3; x4; x5 }
  
  // Defining each manifest variable with square shape, color, and size
  x1 [shape=square, fillcolor="#a9fab1", height=0.5, width=0.5];
  x2 [shape=square, fillcolor="#a9fab1", height=0.5, width=0.5];
  x3 [shape=square, fillcolor="#a9fab1", height=0.5, width=0.5];
  x4 [shape=square, fillcolor="#a9fab1", height=0.5, width=0.5];
  x5 [shape=square, fillcolor="#a9fab1", height=0.5, width=0.5];
  
  /* Latent Variables */
    // Defining the latent variable G1 with a circular shape and color
  G1 [shape=circle, fillcolor="#f4fd78"];
  
  /* Paths */
    // Defining directional paths from the latent variable G1 to each manifest variable
  G1 -> x1[dir=forward];
  G1 -> x2[dir=forward];
  G1 -> x3[dir=forward];
  G1 -> x4[dir=forward];
  G1 -> x5[dir=forward];
  
  // Defining bidirectional paths for each manifest variable to represent error terms
  x1 -> x1[dir=both, headport=s, tailport=s];
  x2 -> x2[dir=both, headport=s, tailport=s];
  x3 -> x3[dir=both, headport=s, tailport=s];
  x4 -> x4[dir=both, headport=s, tailport=s];
  x5 -> x5[dir=both, headport=s, tailport=s];
  
  // Defining a bidirectional path for the latent variable G1 to represent its variance
  G1 -> G1[dir=both, headport=n, tailport=n];
}
```

We can load the dot file into R and visualize the path diagram using the `DiagrammeR` package.

```{r}

library(DiagrammeR)


grViz("one-factor-generated.dot")

```

The resulting path diagram shows the relationships between the latent variable G1 and the manifest variables x1, x2, x3, x4, and x5. The directional paths from G1 to the manifest variables represent the factor loadings, while the bidirectional paths for each manifest variable represent the error terms. The bidirectional path for the latent variable G1 represents its variance.


### Creating a Path Diagram for an ACE Model Using the `umx` Package


The `umx` package provides a user-friendly interface for specifying and estimating structural equation models, including path diagrams for classical models like the ACE model in twin studies. The `umx` package simplifies the process of specifying and estimating complex structural equation models like the ACE model. Below is an example of how to create a path diagram for an ACE model using the `umx` package in R. (Many thanks to Tim Bates for providing the source code). 

```{r}
library(umx)
library(tidyverse)
# Thanks to Tim Bates for making some nice tidy Source Code
# =====================
# = Make an ACE model =
# =====================
# 1. Clean data: Add separator and scale
data(twinData)
tmp <- umx_make_twin_data_nice(data=twinData, 
                               sep="", zygosity="zygosity", 
                               numbering=1:2) %>% 
  umx_scale_wide_twin_data(varsToScale= c("wt", "ht"), 
                           sep= "_T",
                           data= .)
mzData <- subset(tmp, zygosity %in%  
                   c("MZFF", "MZMM"))
dzData <- subset(tmp, zygosity %in%  
                   c("DZFF", "DZMM"))

# 2. Define paths: You only need the paths for one person:
paths <- c(
  umxPath(v1m0 = c("a1", 'c1', "e1")),
  umxPath(means = c("wt")),
  umxPath(c("a1", 'c1', "e1"), to = "wt", values=.2)
)
m1 <- umxTwinMaker("test", paths, mzData = mzData, dzData = dzData)

plot(m1, std= TRUE, means= FALSE)

```

The resulting path diagram shows the relationships between the latent variables A, C, and E and the manifest variable `wt`. The paths between the latent variables and manifest variables represent the factor loadings, while the bidirectional paths for each manifest variable represent the error terms. The bidirectional path for each latent variable represents its variance. 

This diagram only shows the paths for one twin, but the model is estimated using data from multiple twin pairs to estimate the genetic, shared environmental, and non-shared environmental influences on the trait.