Association refers to coefficients which gauge the strength of a relationship.
Coefficients in this section are designed for use with nominal data. Phi and
Cramer's V are based on adjusting chi-square significance to factor out sample
size. These measures do not lend themselves to easy interpretation. Phi and
Cramer's V vary between 0 and 1.
Key Concepts and Terms
Phi.
Phi is a chi-square based measure of association. The chi-square
coefficient depends on the strength of the relationship and sample size.
Phi eliminates sample size by dividing chi-square by n, the sample size,
and taking the square root.
Since phi has a known sampling distribution it is
possible to compute its standard error and significance. SPSS and other major
packages report the significance level of the computed phi value.
Computationally, phi is the square root of chi-square
divided by n, the sample size: phi = SQRT(X2/n). When computing phi,
note that Yates' correction to chi-square is not used. Phi thus measures the
strength of the relationship defined as the number of cases on one diagonal
minus the number on the other diagonal, adjusting for the marginal distribution
of the variables.
Example. For the
table below phi =.41.
Party/Vote
Democrat
Republican
Voted
15
10
Didn't Vote
5
20
Interpretation: In
2-b-2 tables, phi can be interpreted as symmetric percent difference,
measuring the percent of concentration of cases on the diagonal. Also in
2-by-2 tables, phi is identical to the correlation coefficient. In larger
tables, where phi may be greater than 1.0, there is no simple intuitive
interpretation, which is a reason why phi is often used only for 2-by-2
tables. For the example above, phi is .41. The percent difference with
party as independent (column) is 42%, and with vote as independent is 40%.
Phi is the mean percent difference between party and vote with either
considered as causing the other.
Meaning of association:
Phi defines perfect association as predictive monotonicity
(see discussion in the section on association),
and defines null relationship as statistical independence.
Symmetricalness:
Phi is a symmetrical measure. It makes no difference which is the
independent (column) variable. Phi tends to understate asymmetric
relationships. Note, however, that some computer packages, such as Systat, use special formulas for phi in 2-by-2 tables
so that phi varies from -1 to +1, allowing it to indicate negative
relationships when used with dichotmous ordinal
data. However, phi is still a symmetric measure and the sign can be
ignored when using dichotomous nominal data.
Data level: Phi is
used with nominal data, though for 2-by-2 tables, data may be ordinal.
Other features: Phi is
very sensitive to shifts in marginal distributions. Phi does not vary from
0 to 1. For tables larger than 2-by-2, the maximum value of phi is the
square root of (k - 1), where k is the number or rows or the number of
columns, whichever is smaller. This means phi can be greater than 1.0 for
larger tables, with a theoretical maximum of infinity, and differs
depending on table size.
Cramer's
V. Cramer's V is the most popular of the chi-square-based measures of
nominal association because it gives good norming
from 0 to 1 regardless of table size, when row marginals
equal column marginals. V equals the square root
of chi-square divided by sample size, n, times m, which is the smaller of
(rows - 1) or (columns - 1): V = SQRT(X2/nm).
Since V has a known sampling distribution it is
possible to compute its standard error and significance. SPSS and other major
packages report the significance level of the computed V value. The formula for
the variance of Cramer's V is given in Liebetrau
(1983: 15-16).
Interpretation: V may
be viewed as the association between two variables as a percentage of
their maximum possible variation.V2 is the mean square
canonical correlation between the variables. For 2-by-2 tables, V = phi
(hence some packages like Systat print V only
for larger tables).
Meaning of association:
V defines a perfect relationship as one which is predictive or ordered
monotonic, and defines a null relationship as statistical independence, as
discussed in the section on association. However, the more unequal the marginals, the more V will be less than 1.0.
Symmetricalness:
V is a symmetrical measure. It does not matter which is the independent
(column) variable.
Data level: V may be
used with nominal data or higher.
Other features: V can
reach 1.0 only when the two variables have equal marginals.
Assumptions
Assumptions for each
coefficient are discussed above.
Note measures of association,
unlike significance, do not assume randomly sampled data.
Equal marginals.
Cramer's V and all measures which define a perfect relationship in terms
of strict monotonicity require that the marginal
distribution of the two variables be equal for the coefficient to reach
1.0.
Bibliography
Agresti, Alan (1996). Introduction to categorical data analysis. NY: John
Wiley and Sons.
Goodman, Leo A. and W. H. Kruskal (1954, 1959, 1963, 1972).Measures
for association for cross-classification, I, II, III and IV.Journal of the American Statistical Association. 49:
732-764, 54: 123-163, 58: 310-364, and 67: 415-421 respectively. The 1972
installment discusses the uncertainty coefficient.
Liebetrau, Albert M. (1983). Measures of association. Newbury Park, CA: Sage
Publications. Quantitative Applications in the Social
Sciences Series No. 32.
Rosenberg, M. (1968).The logic of survey
analysis. NY: Basic Books.