18  Cochran’s Q test-post-hoc test

Cochran’s Q Test

Cochran’s Q Test is a non-parametric statistical test used to determine whether there are significant differences in the frequencies of a binary outcome across three or more related groups or conditions. It is an extension of the McNemar test for scenarios involving more than two related groups and is commonly used for repeated measures where the response variable is dichotomous. This test is useful for analyzing data from studies where the same subjects are under different conditions, such as different time points or different treatments.

18.1 Understanding Cochran’s Q Test:

1. Null and Alternative Hypotheses:

  • Null Hypothesis (H0): The null hypothesis states that the proportions of the binary outcome are the same across all groups or conditions.
  • Alternative Hypothesis (H1): The alternative hypothesis suggests that there is a significant difference in the proportions of the binary outcome across at least one of the conditions.

2. Test Statistic:

  • The Cochran’s Q test statistic is based on the number of times each subject has the characteristic of interest across all conditions and the total number of characteristics observed for all subjects across all conditions.
  • The test statistic follows a chi-squared distribution with (k - 1) degrees of freedom under the null hypothesis, where (k) is the number of related groups or conditions.

3. Calculation of Test Statistic:

  • Let (n) be the total number of subjects, and (k) be the number of conditions. The Cochran’s Q test statistic is calculated by comparing the variance of the total scores across conditions with the variance expected by chance.
  • The formula for Cochran’s Q test statistic is: \[ Q = \frac{(k-1)(k\sum{T_j^2} - (\sum{T_j})^2)}{k\sum{t_i} - \sum{T_j^2}} \] where (T_j) is the total number of times the characteristic appears in the (j)-th condition and (t_i) is the total number of times the characteristic appears for the (i)-th subject.

4. Interpretation of Results:

  • If the calculated (Q) value is greater than the critical value from the chi-squared distribution with (k - 1) degrees of freedom at the chosen significance level (commonly (= 0.05)), then the null hypothesis is rejected, indicating significant differences across conditions.

18.2 Applications of Cochran’s Q Test:

A. Medical Research:

  • In clinical trials, the Cochran’s Q test is used to assess the consistency of treatment effects observed at different time points or under different conditions within the same group of patients.

B. Psychology:

  • Psychologists may apply the Cochran’s Q test to evaluate the consistency of binary responses (like success or failure) across repeated measures or different experimental conditions.

C. Quality Control:

  • In industrial settings, the Cochran’s Q test can be used to compare the pass/fail rates of products or processes across different shifts or batches.

D. Social Sciences:

  • It’s used in the social sciences to analyze binary outcomes of surveys or experiments that are repeated across multiple conditions or times.

Considerations:

  • Cochran’s Q test assumes that the observations are independent within subjects but not between subjects.
  • The test is only applicable to binary (dichotomous) outcomes.
  • The Cochran’s Q test may lose power if the sample size is small, and alternative methods should be considered in such cases.

In summary, Cochran’s Q test offers a robust method for analyzing differences in binary outcomes across more than two related groups or conditions. It is especially valuable for repeated measures design where the same subjects are exposed to different conditions, allowing researchers to investigate the consistency of an effect or response across those conditions.

18.3 Example Problem: Cochran’s Q test

A software company wants to test the reliability of three versions of a software application (Version A, Version B, and Version C) under the same conditions. They have 10 testers that each test all three versions for reliability. The outcome is binary: Pass (if the software version works reliably during the test) or Fail (if it does not). The results are as follows:

Tester Version A Version B Version C
1 1 0 0
2 1 0 0
3 1 0 0
4 1 0 0
5 1 0 0
6 1 1 0
7 1 1 0
8 1 1 1
9 1 1 1
10 0 1 1

(Pass=1, Fail=0)

The company wants to know if there is a significant difference in reliability between the three software versions.

18.3.1 Calculation of Cochran’s Q Test:

  1. Calculate the totals for each version (sum across testers):

    • \(T_A = 1+1+1+1+1+1+1+1+1+0 = 9\)
    • \(T_B = 0+0+0+0+0+1+1+1+1+1 = 5\)
    • \(T_C = 0+0+0+0+0+0+0+1+1+1 = 3\)
  2. Calculate the totals for each tester (sum across versions):

    • \(t_1 = 1 + 0 + 0 = 1\)
    • \(t_2 = 1 + 0 + 0 = 1\)
    • \(t_3 = 1 + 0 + 0 = 1\)
    • \(t_4 = 1 + 0 + 0 = 1\)
    • \(t_5 = 1 + 0 + 0 = 1\)
    • \(t_6 = 1 + 1 + 0 = 2\)
    • \(t_7 = 1 + 1 + 0 = 2\)
    • \(t_8 = 1 + 1 + 1 = 3\)
    • \(t_9 = 1 + 1 + 1 = 3\)
    • \(t_{10} = 0 + 1 + 1 = 2\)
  3. Compute the sums needed for the Q statistic:

    • \(\sum T_j^2 = 9^2 + 5^2 + 3^2 = 115\)
    • \((\sum T_j)^2 = (9+5+3)^2 = 17^2 = 289\)
    • \(\sum t_i = 1+1+1+1+1+2+2+3+3+2 = 17\)
    • \(\sum t_i^2 = 1^2+1^2+1^2+1^2+1^2+2^2+2^2+3^2+3^2+2^2 = 35\)
    • \(k\sum t_i - \sum t_i^2 = 3 \times 17 - 35 = 16\)
    • Number of versions: \(k = 3\)
  4. Calculate the Q statistic: \[ Q = (k-1)\;\frac{k\sum T_j^2 - (\sum T_j)^2}{k\sum t_i - \sum t_i^2} \]

    Substituting values:

\[ Q = (3-1)\;\frac{3(115) - 289}{16} = 2 \times \frac{345 - 289}{16} = 2 \times \frac{56}{16} = 7 \]

  1. Result:
    • \(Q = 7\) with \(df = k-1 = 2\)
    • Comparing with table value, \(\chi^2_2\), \(p \approx 0.03\).

Interpretation

Since \(p < 0.05\), the Cochran’s Q test indicates a statistically significant difference in reliability among the three software versions.

Conclusion

At least one software version differs in reliability. The company can conclude that the performance of the software versions is not the same, and some versions are more reliable than others.

Cochran’s Q Test in R and Python

Cochran's Q Test via Friedman Test
# Create the data matrix
software_data <- matrix(c(
  1, 0, 0,
  1, 0, 0,
  1, 0, 0,
  1, 0, 0,
  1, 0, 0,
  1, 1, 0,
  1, 1, 0,
  1, 1, 1,
  1, 1, 1,
  0, 1, 1
), nrow = 10, byrow = TRUE)

colnames(software_data) <- c("Version A", "Version B", "Version C")
rownames(software_data) <- paste("Tester", 1:10)

print(software_data)
          Version A Version B Version C
Tester 1          1         0         0
Tester 2          1         0         0
Tester 3          1         0         0
Tester 4          1         0         0
Tester 5          1         0         0
Tester 6          1         1         0
Tester 7          1         1         0
Tester 8          1         1         1
Tester 9          1         1         1
Tester 10         0         1         1
# Perform the test using friedman.test() 

result <- friedman.test(software_data)


print(result)

    Friedman rank sum test

data:  software_data
Friedman chi-squared = 7, df = 2, p-value = 0.0302
# Interpret the result 
alpha <- 0.05
p <- result$p.value

  
if (p < alpha) {
  cat("(P-value)",  p, "<", alpha, 
      "(significance level / alpha).",  "\n",
      " Reject the null hypothesis: 
      There is a significant difference.\n")
} else {
  cat("(P-value)",  p, ">=", alpha, 
      "(significance level / alpha).", "\n",
      " Do not Reject the null hypothesis: 
      No significant difference.\n")
}
(P-value) 0.03019738 < 0.05 (significance level / alpha). 
  Reject the null hypothesis: 
      There is a significant difference.
Cochran's Q Test via Friedman Test in Python
import numpy as np
import pandas as pd
from scipy import stats

#Create and display the data matrix 

software_data = pd.DataFrame(
    data=[
      [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0], [1, 0, 0],
      [1, 1, 0], [1, 1, 0], [1, 1, 1], [1, 1, 1], [0, 1, 1]
    ],
    index=[f"Tester {i}" for i in range(1, 11)],
    columns=["Version A", "Version B", "Version C"]
)

print(software_data)
           Version A  Version B  Version C
Tester 1           1          0          0
Tester 2           1          0          0
Tester 3           1          0          0
Tester 4           1          0          0
Tester 5           1          0          0
Tester 6           1          1          0
Tester 7           1          1          0
Tester 8           1          1          1
Tester 9           1          1          1
Tester 10          0          1          1
# Perform the test using scipy.stats.friedmanchisquare() 

software_data_array = software_data.to_numpy()

result = stats.friedmanchisquare(
    software_data_array[:, 0],  # Column for Version A
    software_data_array[:, 1],  # Column for Version B
    software_data_array[:, 2]   # Column for Version C
)

# Access specific values
chi2 = result.statistic
p = result.pvalue

# Round and print them
print(f"Chi2 Statistic: {chi2:.4f}")
Chi2 Statistic: 7.0000
print(f"P-value: {p:.4f}")
P-value: 0.0302
# Interpret the result
alpha = 0.05

if p < alpha:
    print(f"(P-value) {p:.4f} < {alpha} (sig level / alpha).")
    print("Reject the null hypothesis:", "\n", "There is a significant difference.\n")
else:
    print(f"(P-value) {p:.4f} >= {alpha} (significance level / alpha).")
    print("Do not reject the null hypothesis:", "\n", "No significant difference.\n")
(P-value) 0.0302 < 0.05 (sig level / alpha).
Reject the null hypothesis: 
 There is a significant difference.

18.4 Post-hoc Tests for Cochran’s Q Test:

After performing Cochran’s Q test, if the result is significant, it implies that there are differences in the binary outcomes across the related groups. However, Cochran’s Q test does not specify which groups differ from each other. To identify the specific groups between which these differences occur, you would perform post-hoc pairwise comparisons.

For Cochran’s Q test, one common approach for post-hoc analysis is to use pairwise comparisons with a Bonferroni correction to adjust for multiple testing.

Let’s consider a hypothetical example involving Cochran’s Q test and its subsequent post-hoc analysis:

Post-hoc Tests for Cochran’s Q Test in R and Python

Post-hoc for Cochran’s QTest with pairwise McNemar tests
library(rcompanion)

# Significance level
alpha <- 0.05

# Check if Cochran's Q Test was significant
if(result$p.value < alpha){
  cat("Cochran's Q Test is significant:", "\n", "performing post-hoc pairwise McNemar tests.\n\n")
  
  # Generate all pairs of software versions
  version_pairs <- combn(ncol(software_data), 2, simplify = FALSE)
  
  for(pair in version_pairs){
    v1 <- pair[1]
    v2 <- pair[2]
    
    # Create 2x2 contingency table for the pair
    version_pair <- table(software_data[,v1], software_data[,v2])
    
    # McNemar test
    test <- mcnemar.test(version_pair)
    
    # Post-hoc decision
    decision <- if(test$p.value < alpha) "Significant difference" else "No significant difference"
    
    cat(colnames(software_data)[v1], "vs", colnames(software_data)[v2], ":\n")

    print(test)
  }
  
} else {
  cat("Cochran's Q Test is not significant: post-hoc tests are not applicable.\n")
}
Cochran's Q Test is significant: 
 performing post-hoc pairwise McNemar tests.

Version A vs Version B :

    McNemar's Chi-squared test with continuity correction

data:  version_pair
McNemar's chi-squared = 1.5, df = 1, p-value = 0.2207

Version A vs Version C :

    McNemar's Chi-squared test with continuity correction

data:  version_pair
McNemar's chi-squared = 3.125, df = 1, p-value = 0.0771

Version B vs Version C :

    McNemar's Chi-squared test with continuity correction

data:  version_pair
McNemar's chi-squared = 0.5, df = 1, p-value = 0.4795
Post-hoc for Cochran’s QTest with pairwise McNemar tests
import itertools
import pandas as pd
from statsmodels.stats.contingency_tables import cochrans_q, mcnemar

# Significance level
alpha = 0.05

# --- Cochran's Q Test ---
# Pass the entire dataframe (each column = related sample)
result = cochrans_q(software_data)


#Post-hoc pairwise McNemar tests if significant 
if result.pvalue < alpha:
    print("Cochran’s Q Test is significant:")
    print("Performing post-hoc pairwise McNemar tests...\n")

    # Generate all unique pairs of software versions
    version_pairs = list(itertools.combinations(software_data.columns, 2))

    for v1, v2 in version_pairs:
        # Create a 2x2 contingency table
        table = pd.crosstab(software_data[v1], software_data[v2])
        table = table.reindex(index=[0, 1], columns=[0, 1], fill_value=0)

        # Perform McNemar test
        test = mcnemar(table, exact=False, correction=True)

        # Decision
        decision = "Significant difference" if test.pvalue < alpha else "No significant difference"

        # Output results
        print(f"{v1} vs {v2}:")
        print(f"  McNemar's Chi2 = {test.statistic:.4f}, P-value = {test.pvalue:.4f}")
        print(f"  → {decision}\n")

else:
    print("Cochran’s Q Test is not significant: post-hoc tests are not applicable.")
Cochran’s Q Test is significant:
Performing post-hoc pairwise McNemar tests...

Version A vs Version B:
  McNemar's Chi2 = 1.5000, P-value = 0.2207
  → No significant difference

Version A vs Version C:
  McNemar's Chi2 = 3.1250, P-value = 0.0771
  → No significant difference

Version B vs Version C:
  McNemar's Chi2 = 0.5000, P-value = 0.4795
  → No significant difference