27 One way Anova

One-way ANOVA (Analysis of Variance) is a statistical technique used to compare the means of three or more independent (unrelated) groups to determine if there are any statistically significant differences between the mean scores of these groups. It extends the t-test for comparing more than two groups, providing a way to handle complex comparisons without increasing the risk of committing Type I errors (incorrectly rejecting the null hypothesis).

27.0.1 Purpose

The primary purpose of a one-way ANOVA is to test if at least one group mean is different from the others, which suggests that at least one treatment or condition has an effect that is not common to all groups.

27.0.2 Assumptions

One-way ANOVA makes several key assumptions:

Independence of Observations: Each group’s observations must be independent of the observations in other groups.
Normality: Data in each group should be approximately normally distributed.
Homogeneity of Variances: All groups must have the same variance, often assessed by Levene’s Test of Equality of Variances.

27.0.3 Hypotheses

The hypotheses for a one-way ANOVA are formulated as:

Null Hypothesis (H₀): The means of all groups are equal, implying no effect of the independent variable on the dependent variable across the groups.
Alternative Hypothesis (H₁): At least one group mean is different from the others, suggesting an effect of the independent variable.

27.0.4 Calculations

The analysis involves several key calculations:

Total Sum of Squares (SST): Measures the total variability in the dependent variable.
Sum of Squares Between (SSB): Reflects the variability due to the interaction between the groups.
Sum of Squares Within (SSW): Captures the variability within each group.
Degrees of Freedom (DF): Varies for each sum of squares; DF between = \(k - 1\) (where \(k\) is the number of groups) and DF within = \(N - k\) (where \(N\) is the total number of observations).
Mean Squares: Each sum of squares is divided by its respective degrees of freedom to obtain mean squares (MSB and MSW).
F-statistic: The ratio of MSB to MSW, which follows an F-distribution under the null hypothesis.

27.0.5 Interpretation

The result of a one-way ANOVA is typically reported as an F-statistic and its corresponding p-value. The F-statistic determines whether the observed variances between means are large enough to be considered statistically significant:

If the F-statistic is larger than the critical value (or if the p-value is less than the significance level, typically 0.05), the null hypothesis is rejected, indicating significant differences among the means.
If the F-statistic is smaller than the critical value, the null hypothesis is not rejected, suggesting no significant difference among the group means.

27.0.6 One way Anova Example Problem

A company wants to know the impact of three different selection methods on the employee performance. The HR analyst chose 15 employees at random and collected the data of sales volume reached by each employee. Out of 15 employees, 5 employees were taken from each of the selection methods. The data obtained are given below.

No.	Emp Referral	Job Portals	Consultancy
1	11	17	15
2	15	18	16
3	18	21	18
4	19	22	19
5	22	27	22

At the 0.05 level of significance, do the selection methods have different effects on the performance of employees?

Calculations:

To perform a one-way ANOVA test to see if there are significant differences in the performance of employees based on their selection method (Emp Referral, Job Portals, Consultancy), we need to calculate several components including the group means, the overall mean, the sum of squares between groups (SSB), the sum of squares within groups (SSW), and the total sum of squares (SST). Additionally, we’ll calculate the F-statistic and compare it to the critical F-value from an F-distribution table.

Data Organization:

Group A (Emp Referral): \([11, 15, 18, 19, 22]\)
Group B (Job Portals): \([17, 18, 21, 22, 27]\)
Group C (Consultancy): \([15, 16, 18, 19, 22]\)

Calculate the Means for Each Group:

\[ \bar{x}_A = \frac{11 + 15 + 18 + 19 + 22}{5} = 17 \] \[ \bar{x}_B = \frac{17 + 18 + 21 + 22 + 27}{5} = 21 \] \[ \bar{x}_C = \frac{15 + 16 + 18 + 19 + 22}{5} = 18 \]

Calculate the Overall Mean:

\[ \bar{x} = \frac{11 + 15 + 18 + 19 + 22 + 17 + 18 + 21 + 22 + 27 + 15 + 16 + 18 + 19 + 22}{15} = 18.667 \]

Calculate Sum of Squares Between Groups (SSB):

\[ SSB = 5[(\bar{x}_A - \bar{x})^2 + (\bar{x}_B - \bar{x})^2 + (\bar{x}_C - \bar{x})^2] \] \[ = 5[(17 - 18.667)^2 + (21 - 18.667)^2 + (18 - 18.667)^2] \] \[ = 5[(-1.667)^2 + (2.333)^2 + (-0.667)^2] \] \[ = 5[2.778 + 5.444 + 0.444] = 5 \times 8.667 \] \[= 43.333 \]

Calculate Sum of Squares Within Groups (SSW):

\[ SSW = \sum_{i=1}^{5} (x_{Ai} - \bar{x}_A)^2 + \sum_{i=1}^{5} (x_{Bi} - \bar{x}_B)^2 + \sum_{i=1}^{5} (x_{Ci} - \bar{x}_C)^2 \] \[ = [(11-17)^2 + (15-17)^2 + (18-17)^2 + (19-17)^2 + (22-17)^2] \] \[ \;\;\;\; + [(17-21)^2 + (18-21)^2 + (21-21)^2 + (22-21)^2 + (27-21)^2] \] \[ \;\;\;\; + [(15-18)^2 + (16-18)^2 + (18-18)^2 + (19-18)^2 + (22-18)^2] \] \[ = [36 + 4 + 1 + 4 + 25 + 16 + 9 + 0 + 1 + 36 + 9 + 4 + 0 + 1 + 16] \] \[= 162 \]

Calculate the Total Sum of Squares (SST):

\[ SST = SSB + SSW = 43.333 + 162 = 205.333 \]

Calculate Mean Squares:

\[ between groups = MSB = \frac{SSB}{k-1} = \frac{43.333}{3-1} = 21.667 \] \[ within groups = MSW = \frac{SSW}{N-k} = \frac{162}{15-3} = 13.5 \]

Calculate F-statistic:

\[ F = \frac{MSB}{MSW} = \frac{21.667}{13.5} = 1.605 \]

degrees of freedom

Degrees of freedom for the numerator (df1): This corresponds to the number of groups minus one. In your case, with three groups (Emp Referral, Job Portals, Consultancy), \(df1 = 3 - 1 = 2\).
Degrees of freedom for the denominator (df2): This corresponds to the total number of observations minus the number of groups. For 15 employees and 3 groups, \(df2 = 15 - 3 = 12\).
Significance level (α): Typically, this is set at 0.05 for most studies, implying a 95% confidence level in the results.

27.0.7 Critical F-value Interpretation

You would locate the value in the F-table where \(df1 = 2\) and \(df2 = 12\), at the row and column intersecting at \(α = 0.05\). The critical F-value at these degrees of freedom and significance level is typically provided by statistical tables available in textbooks or online resources.

For practical purposes, based on typical values found in F-distribution tables for these degrees of freedom: - If the critical F-value is around 3.89 (common value for df1 = 2, df2 = 12, at α = 0.05), then since 1.605 < 3.89, you would fail to reject the null hypothesis, concluding that there is no significant effect of the selection method on employee performance at the 0.05 significance level.

This interpretation means that, based on your ANOVA results, the different selection methods do not statistically significantly impact employee sales performance.

27.0.8 One way ANOVA Test in R

# Prepare the Data
emp_referral <- c(11, 15, 18, 19, 22)
job_portals <- c(17, 18, 21, 22, 27)
consultancy <- c(15, 16, 18, 19, 22)
alpha = 0.05
# Combining the data into a single data frame
data <- data.frame(
  Sales = c(emp_referral, job_portals, consultancy),
  Method = factor(rep(c("Emp Referral", "Job Portals", "Consultancy"), each = 5))
)
data

   Sales       Method
1     11 Emp Referral
2     15 Emp Referral
3     18 Emp Referral
4     19 Emp Referral
5     22 Emp Referral
6     17  Job Portals
7     18  Job Portals
8     21  Job Portals
9     22  Job Portals
10    27  Job Portals
11    15  Consultancy
12    16  Consultancy
13    18  Consultancy
14    19  Consultancy
15    22  Consultancy

# Perform ANOVA Test
result <- aov(Sales ~ Method, data = data)

# Results
summary(result)

            Df Sum Sq Mean Sq F value Pr(>F)
Method       2  43.33   21.67   1.605  0.241
Residuals   12 162.00   13.50

# Get the summary of the ANOVA test
summary_result <- summary(result)

# Extract the p-value
p_value <- summary_result[[1]]["Method", "Pr(>F)"]

# hypothesis decision
if (p_value < alpha) {
  cat("Reject null hypothesis\n")
} else {
  cat("Do not reject null hypothesis\n")
}

Do not reject null hypothesis

27.0.9 One way ANOVA Test in python

Install statsmodels package

!pip3 install statsmodels

import pandas as pd
from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Step 1: Prepare the Data
emp_referral = [11, 15, 18, 19, 22]
job_portals = [17, 18, 21, 22, 27]
consultancy = [15, 16, 18, 19, 22]
alpha = 0.05
# Combining the data into a single DataFrame
data = pd.DataFrame({
    'Sales': emp_referral + job_portals + consultancy,
    'Method': ['Emp Referral'] * 5 + ['Job Portals'] * 5 + ['Consultancy'] * 5
})
data

    Sales        Method
0      11  Emp Referral
1      15  Emp Referral
2      18  Emp Referral
3      19  Emp Referral
4      22  Emp Referral
5      17   Job Portals
6      18   Job Portals
7      21   Job Portals
8      22   Job Portals
9      27   Job Portals
10     15   Consultancy
11     16   Consultancy
12     18   Consultancy
13     19   Consultancy
14     22   Consultancy

# Step 2: Perform ANOVA Test
model = ols('Sales ~ C(Method)', data=data).fit()

# Step 3: Get the summary to see the results
result = sm.stats.anova_lm(model, typ=2)
print(result)

               sum_sq    df         F    PR(>F)
C(Method)   43.333333   2.0  1.604938  0.241176
Residual   162.000000  12.0       NaN       NaN

# Extract p-value
p_value = result.loc['C(Method)', 'PR(>F)']
# Hypothesis decision
if p_value < alpha:
    print("Reject null hypothesis")
else:
    print("Do not reject null hypothesis")

Do not reject null hypothesis

27.0.10 Example Research Articles on Anova:

Esra Emir et al. (2025) : 👉 Download Article
Aynur Bozkurt Bostancı & Musa Pullu (2025) : 👉 Download Article