-
Notifications
You must be signed in to change notification settings - Fork 0
G Coefficients
A good test has to measure what it claims to measure. Performance tests provide mean test scores for candidates
The test results of record are the mean scores of each student. But desired ranking of students depends on their ability. So the indicator of test quality (G Coefficient) is the ratio of the variance of student ability effect to the variance of their mean scores. In other words: "how much of the test score variance is explained by the ability effect variance?". Mathematically, that looks like this:
More generally speaking, the literature recognizes two types of G Coefficients: Eρ2, and Φ, depending on how the student score variance has been defined. Unfortunately we don't get around looking more carefully at 'variance', and how it is calculated.
Let us consider a set of N random numbers yk, where k ranges from 0 to N-1. As we have seen, its arithmetic mean is calculated as:
It turns out, however, that the mean
Let us now get back to the output of urGenova, after it has processed the data entered, we are looking at the variance components for the two facets - student, question, and their interaction:
We are left with figuring out, how to express
where σ2(τ); stands for the sum of all variance components that contain the facet of differentiation, but not any facets of generalization (Brennan's Rule I.).
Depending on the type of G Coefficient we want to calculate, formula (7) becomes:
or
where σ2(Δ) is the sum of all variance components, except for σ2(τ) itself, divided by the size product of the respective facets of generalization (Brennan's Rule II.), and σ2(δ) the sum of all variance components that contains the facet of differentiation and at least one facet of generalization, divided by the sample size product of such. (Brennan's Rule III.)
Both Brennan's Rule II, and Brennan's Rule III require summing variances of means for facets of generalization. According to equation 6 this requires division of the sample variance by the total number of sample items. For set variances of single facet sets, this is simply the sample size of this facet. For purely crossed facets, it becomes the product of the facet sample sizes. But in the case of nested facets, there are multiple sample sizes, depending on the value of the nesting facet. In this case it becomes necessary to first calculate a mean of sample sizes for the nested facet. But it is no longer the arithmetic mean, the harmonic mean is required.
As a general convention, Eρ2 should have at least a value of 0.80 for 'High Stakes Exams'.
Youtubes
- Generalizability Analysis I: Facets & Variance
- Generalizability Analysis II: Systematic Bias
- Generalizability Analysis III: Missing Data, and Replications
- Installing G-String in MacOS
For G_String_M users*
- Using G_String_M
- Understanding Generalizability Analysis
- Explore Generalizability Analysis with R
- When something doesn't seem to work
- How to join the User Group
For IT Professionals