Nominal Association: Phi and Cramer's V

Association refers to coefficients which gauge the strength of a relationship. Coefficients in this section are designed for use with nominal data. Phi and Cramer's V are based on adjusting chi-square significance to factor out sample size. These measures do not lend themselves to easy interpretation. Phi and Cramer's V vary between 0 and 1.

Key Concepts and Terms

Phi. Phi is a chi-square based measure of association. The chi-square coefficient depends on the strength of the relationship and sample size. Phi eliminates sample size by dividing chi-square by n, the sample size, and taking the square root.

Since phi has a known sampling distribution it is possible to compute its standard error and significance. SPSS and other major packages report the significance level of the computed phi value.

Computationally, phi is the square root of chi-square divided by n, the sample size: phi = SQRT(X²/n). When computing phi, note that Yates' correction to chi-square is not used. Phi thus measures the strength of the relationship defined as the number of cases on one diagonal minus the number on the other diagonal, adjusting for the marginal distribution of the variables.

Example. For the table below phi =.41.

Party/Vote	Democrat	Republican
Voted	15	10
Didn't Vote	5	20

Interpretation: In 2-b-2 tables, phi can be interpreted as symmetric percent difference, measuring the percent of concentration of cases on the diagonal. Also in 2-by-2 tables, phi is identical to the correlation coefficient. In larger tables, where phi may be greater than 1.0, there is no simple intuitive interpretation, which is a reason why phi is often used only for 2-by-2 tables. For the example above, phi is .41. The percent difference with party as independent (column) is 42%, and with vote as independent is 40%. Phi is the mean percent difference between party and vote with either considered as causing the other.
Meaning of association: Phi defines perfect association as predictive monotonicity (see discussion in the section on association), and defines null relationship as statistical independence.
Symmetricalness: Phi is a symmetrical measure. It makes no difference which is the independent (column) variable. Phi tends to understate asymmetric relationships. Note, however, that some computer packages, such as Systat, use special formulas for phi in 2-by-2 tables so that phi varies from -1 to +1, allowing it to indicate negative relationships when used with dichotmous ordinal data. However, phi is still a symmetric measure and the sign can be ignored when using dichotomous nominal data.
Data level: Phi is used with nominal data, though for 2-by-2 tables, data may be ordinal.
Other features: Phi is very sensitive to shifts in marginal distributions. Phi does not vary from 0 to 1. For tables larger than 2-by-2, the maximum value of phi is the square root of (k - 1), where k is the number or rows or the number of columns, whichever is smaller. This means phi can be greater than 1.0 for larger tables, with a theoretical maximum of infinity, and differs depending on table size.

Cramer's V. Cramer's V is the most popular of the chi-square-based measures of nominal association because it gives good norming from 0 to 1 regardless of table size, when row marginals equal column marginals. V equals the square root of chi-square divided by sample size, n, times m, which is the smaller of (rows - 1) or (columns - 1): V = SQRT(X²/nm).

Since V has a known sampling distribution it is possible to compute its standard error and significance. SPSS and other major packages report the significance level of the computed V value. The formula for the variance of Cramer's V is given in Liebetrau (1983: 15-16).

Interpretation: V may be viewed as the association between two variables as a percentage of their maximum possible variation.V² is the mean square canonical correlation between the variables. For 2-by-2 tables, V = phi (hence some packages like Systat print V only for larger tables).
Meaning of association: V defines a perfect relationship as one which is predictive or ordered monotonic, and defines a null relationship as statistical independence, as discussed in the section on association. However, the more unequal the marginals, the more V will be less than 1.0.
Symmetricalness: V is a symmetrical measure. It does not matter which is the independent (column) variable.
Data level: V may be used with nominal data or higher.
Other features: V can reach 1.0 only when the two variables have equal marginals.

Assumptions

Assumptions for each coefficient are discussed above.
Note measures of association, unlike significance, do not assume randomly sampled data.
Equal marginals. Cramer's V and all measures which define a perfect relationship in terms of strict monotonicity require that the marginal distribution of the two variables be equal for the coefficient to reach 1.0.

Bibliography

Agresti, Alan (1996). Introduction to categorical data analysis. NY: John Wiley and Sons.

Goodman, Leo A. and W. H. Kruskal (1954, 1959, 1963, 1972). Measures for association for cross-classification, I, II, III and IV. Journal of the American Statistical Association. 49: 732-764, 54: 123-163, 58: 310-364, and 67: 415-421 respectively. The 1972 installment discusses the uncertainty coefficient.

Liebetrau, Albert M. (1983). Measures of association. Newbury Park, CA: Sage Publications. Quantitative Applications in the Social Sciences Series No. 32.

Rosenberg, M. (1968). The logic of survey analysis. NY: Basic Books.