R
[1] 1
Karl Pearson’s correlation coefficient, also known as the Pearson product-moment correlation coefficient (PPMCC) or simply Pearson’s correlation, is a measure used in statistics to determine the degree of linear relationship between two variables. It’s widely used in the sciences to quantify the linear correlation between datasets.
Pearson’s correlation requires certain assumptions about the data it is used to analyze:
The Pearson correlation coefficient (\(r\)) is calculated using the following formula: \[ r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}} \] Where: - \(n\) is the number of data points. - \(x\) and \(y\) are the variables for which the correlation is being calculated. - \(\sum\) represents the summation symbol, aggregating all values of \(x\), \(y\), \(xy\), \(x^2\), and \(y^2\).
The value of \(r\) ranges from -1 to +1: - +1 indicates a perfect positive linear relationship, - -1 indicates a perfect negative linear relationship, - 0 means no linear relationship exists. Values close to +1 or -1 indicate a strong relationship, while values close to 0 indicate a weak relationship.
Suppose we want to determine the relationship between hours studied and scores obtained in an exam. Here are the data for 5 students:
Using the data points provided, the calculation involves first calculating sums and products needed:
Plugging these values into the formula gives: \[ r = \frac{5(1300) - (30)(300)}{\sqrt{[5(220) - (30)^2][5(30000) - (300)^2]}} = 1 \]
Since \(r = 1\), there is a perfect positive linear relationship between the hours studied and the scores obtained, supporting the alternative hypothesis that there is a significant linear relationship.
Download the Excel file link here
Python
import numpy as np
# Data for calculation
hours_studied = np.array([2, 4, 6, 8, 10])
scores = np.array([20, 40, 60, 80, 100])
# Perform Pearson's correlation
correlation_coefficient = np.corrcoef(hours_studied, scores)[0, 1]
# Print the results
print("Pearson's correlation coefficient:", correlation_coefficient)
Pearson's correlation coefficient: 1.0