G Coefficients

A good test has to measure what it claims to measure. Performance tests provide mean test scores for candidates $\overline X_{i} =\frac{1}{30} \sum \limits_{j=0}^{29}X_{i,j}$, but claim to rate their ability. The more the test scores of candidates reflect their abilities, the more generalizable is the test.

The test results of record are the mean scores of each student. But desired ranking of students depends on their ability. So the indicator of test quality (G Coefficient) is the ratio of the variance of student ability effect to the variance of their mean scores. In other words: "how much of the test score variance is explained by the ability effect variance?". Mathematically, that looks like this:

$$ \huge {\text{G Coefficient} = \frac{\sigma^2_s}{\sigma^2(\overline{X_i})} \qquad \text{(Equation 2)}} $$

More generally speaking, the literature recognizes two types of G Coefficients: Eρ², and Φ, depending on how the student score variance has been defined. Unfortunately we don't get around looking more carefully at 'variance', and how it is calculated.

Let us consider a set of N random numbers y_k, where k ranges from 0 to N-1. As we have seen, its arithmetic mean is calculated as:

$$\large \overline y =\frac{1}{N} \sum \limits_{k=0}^{N-1}y_{k} \qquad \text{(Equation 3)}$$ while its harmonic mean results from:

$$\large \tilde y=\frac{N}{\sum \limits_{k=0}^{N-1} \frac{1}{y_k}} \qquad \text{(Equation 4)}$$ and its variance:

$$\large \sigma^2 (y) = \frac{1}{N-1} \sum \limits_{k=0}^{N-1}(y_{k}-\overline y)^{2} \qquad \text{(Equation 5)}$$

It turns out, however, that the mean $\overline y$ no longer has a fixed value, it now has its own, admittedly smaller variance (variance of the mean):

$$\large \sigma^{2} (\overline y)=\frac{\sigma^2 (y)}{N} \qquad \text{(Equation 6)}$$

Let us now get back to the output of urGenova, after it has processed the data entered, we are looking at the variance components for the two facets - student, question, and their interaction:

Variance Components

We are left with figuring out, how to express $\sigma^2(\overline{X_i})$ in terms of the calculated variance components. We must keep in mind that we are interested in the variance of the facet of differentiation in relationship to the total score variance, i.e. averaged over all facets of generalization. This leads us to the following expression:

$$ \huge \text{G Coefficient} = \frac{\sigma^2 (\tau)}{\sigma^2(\tau)+ \sigma^2(\text{}. . . .)} \qquad \text{(Equation 7)} $$

where σ²(τ); stands for the sum of all variance components that contain the facet of differentiation, but not any facets of generalization (Brennan's Rule I.).

Depending on the type of G Coefficient we want to calculate, formula (7) becomes:

$$ \large \text{Generalization coefficient: } \huge\qquad E\rho^2 = \frac{\sigma^2 (\tau)}{\sigma^2(\tau)+ \sigma^2(\delta)} \qquad \text{(Equation 7a)} $$

or

$$ \large \text{Index of dependability: } \huge\qquad \Phi = \frac{\sigma^2 (\tau)}{\sigma^2(\tau)+ \sigma^2(\Delta)} \qquad \text{(Equation 7b)} $$

where σ²(Δ) is the sum of all variance components, except for σ²(τ) itself, divided by the size product of the respective facets of generalization (Brennan's Rule II.), and σ²(δ) the sum of all variance components that contains the facet of differentiation and at least one facet of generalization, divided by the sample size product of such. (Brennan's Rule III.)

Both Brennan's Rule II, and Brennan's Rule III require summing variances of means for facets of generalization. According to equation 6 this requires division of the sample variance by the total number of sample items. For set variances of single facet sets, this is simply the sample size of this facet. For purely crossed facets, it becomes the product of the facet sample sizes. But in the case of nested facets, there are multiple sample sizes, depending on the value of the nesting facet. In this case it becomes necessary to first calculate a mean of sample sizes for the nested facet. But it is no longer the arithmetic mean, the harmonic mean is required.

As a general convention, Eρ² should have at least a value of 0.80 for 'High Stakes Exams'.

Next

Wiki

Home

Download G_String_M

View the G_String Manual

Download the G_String Manual

Youtubes

For G_String_M users*

For IT Professionals

Bibliography

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

G Coefficients

Wiki

Clone this wiki locally